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The  work,  presented  in  this  thesis  deals  with  the  development  of  a  fast  and 
fairly  accurate  Computer  Aided  Design  software  for  simulating  very-large- 
scale-integrated  (  VLSI  )  circuits.  The  methods  rely  on  piecewise  linearized 
nonlinear  elements  in  the  circuits. 

The  piecewise  linear  approaches  explored  in  this  work  are 

1.  A  fast  piecewise  linear  Gauss-Seidel  waveform  relaxation  method. 

2.  A  slower  but  more  accurate  piecewise  linear  method  based  on  simplices. 

3.  A  Gauss-Seidel  piecewise  linear  method  with  dynamic  partitioning. 

Also  described  is  a  mixed  method  which  combines  the  fast  piecewise  linear 
method  and  the  dynamic  partitioning  method.  The  circuit  to  be  analyzed  is  par¬ 
titioned  into  dc-connected  subcircuits  and  then  sequenced  for  analysis.  Small 
subcircuits  are  solved  using  the  fast  piecewise  linear  method  while  iarge  subcir¬ 
cuits,  including  the  strongly-connected  components  in  the  circuit,  are  solved 
using  the  dynamic  partitioning  method. 

A  parallel  implementation  of  the  Gauss-Seidel  piecewise  linear  method 
with  dynamic  partitioning  on  a  uniprocessor  computer  is  studied.  Algorithms 
for  the  parallel  implementation  of  the  dynamic  partitioning  approach  on  a  mul¬ 
tiprocessor  with  shared  memory  (  Alliant  FX/8)  are  also  explained  in  detail. 
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The  piecewise  linear  methods  presented  in  this  work  have  been  imple¬ 
mented  in  a  set  of  programs  called  PLATINUM.  The  waveforms  generated  by 
PLATINUM  are  fairly  accurate  as  compared  to  those  for  SP1CE2,  and  the 
speedup  for  a  uniprocessor  machine  is  over  two  orders  of  magnitude,  while  the 
parallel  implementation  gives  an  additional  4  to  b  times  speed  improvements. 
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CHAPTER  1 


INTRODUCTION 


Fabrication  of  integrated  circuits  is  expensive  and  errors  encountered  after 
the  process  is  completed  cannot  be  corrected.  Therefore,  before  the  circuit  is 
fabricated,  it  is  important  to  design  the  circuits  as  best  as  possible  and  then 
simulate  the  operation  of  the  circuits  to  check  if  the  performance  matches  the 
specifications.  In  general,  simulation  can  be  divided  into  classes  corresponding 
to  the  different  levels  of  the  design: 

1.  functional  level  simulation 

2.  register  transfer  level  simulation 

3.  logic  simulation 

4.  timing  simulation 

5.  circuit  simulation 
b.  device  simulation 
7.  process  simulation 

Although  simulation  at  each  of  these  levels  is  important  for  successful 
design,  this  work  is  aimed  at  developing  fast  and  reliable  methods  for  circuit 
ana  timing  simulations  of  large-scale  circuits.  Before  describing  the  new 
methods,  well-estabiished  techniques  as  well  as  recently  proposed  ones  are 
briefly  reviewed  below. 

Circuit  simulators,  such  as  SPICE2  [l]  and  ASTAP  [3o],  provide  accurate 
results.  These  standard  circuit  simulators  basically  follow  the  orocedure  mdi- 


cated  below  : 


1.  Transform  the  nonlinear  differential-algebraic  equations  describing  the 
dynamic  behavior  of  the  circuit  into  nonlinear  algebraic  equations 
using  implicit  integration  methods. 

2.  Generate  linear  equations  by  iteratively  applying  the  Newton-Raphson 
formula  to  the  nonlinear  algebraic  equations. 

3.  Solve  the  linear  equations  at  each  Newton-Raphson  iteration  using 
sparse  Gaussian  elimination  techniques. 

More  recent  circuit  simulators  apply  tearing  or  partitioning  methods  to 
lower  the  computation  time.  Tearing  refers  to  breaking  the  original  system  into 
subsystems,  solving  each  subsystem  separately,  and  then  taking  care  of  the 
interconnections  among  them.  The  main  advantage  of  dividing  the  original  net¬ 
work  into  subnetworks  is  that  the  inactivity  or  latency  of  the  subcircuits  can 
be  exploited.  It  has  been  observed  that  inactivity  or  latency  in  large  digital  cir¬ 
cuits  accounts  for  up  to  80  percent  of  the  network  variables.  The  numerical 
convergence  and  stability  properties  of  tearing  methods  are  the  same  as  those  of 
standard  circuit  simulators,  provided  direct  methods  are  used  to  solve  the  par- 
titione  :  equations.  One  well-known  tearing  method  is  equivalent  to  reordering 
the  system  variables  into  bordered  block  diagonal  (BBD)  form  [51] 
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where  w  e  R  is  the  vector  of  tearing  variables  and  v  t  R  '  is  the  vector  of  the 
rest  of  the  variables.  T  is  a  kxk  tearing  matrix,  and  D  is  an  mx:n  block  diagonal 
matrix.  The  tearing  variables  w  are  solved  by  eliminating  variables  v 


ir  -Q7  D  ]p)  w  =s  -Q7Z)  V 
Then  the  rest  of  the  variables  v  are  solved  : 

A  v «  =  y ,  ~  p, w 

where  the  subscript  i  indicates  the  i'  subcircuit. 

One  example  of  a  circuit  simulator  that  uses  the  above  tearing  method  is 
SLATE  [2].  Due  to  the  fact  that  only  a  small  percentage  of  the  total  subnet¬ 
works  are  active  at  a  particular  time,  and  hence  only  few  subnetworks  need  to 
be  analyzed,  this  method  can  provide  savings  in  computation  time. 

Another  way  to  save  computation  time  is  to  apply  a  relaxation  based  solu¬ 
tion  method  [  1 1  ].  which  can  also  be  considered  as  a  form  of  tearing.  An  exam¬ 
ple  of  a  circuit  simulator  that  utilizes  a  reiaxation-method  is  RELAX  [  1 1  ], 
which  solves  the  equations  at  the  nonlinear,  algebraic-differential  equation 
level.  In  RELAX  [  1 1  ].  wThile  solving  for  unknown  variables  assigned  to  each 
subsystem  for  the  time  period  [r,.?.],  the  rest  of  the  unknown  variables  not 
assigned  to  that  particular  subsystem  are  relaxed  to  waveforms  of  previous 
iterations.  The  advantage  inherent  in  the  waveform  relaxation  method  is  that 
each  subsystem  can  be  solved  using  its  own  time  step,  and  thus  can  exploit 
latency  in  a  natural  way.  The  main  disadvantage  is  that  for  subsystems  with 
strong  coupling  among  them  the  method  converges  very  slowly. 

Another  way  to  reduce  simulation  time  is  to  use  timing  simulators, 
switched-level  simulators  [3,10],  or  timing  verifiers  [14.15].  Timing  simulators 
use  methods  similar  to  those  used  in  circuit  simulators,  while  switchtd-ieve: 
simulators  and  timing  verifiers  use  approaches  that  are  completely  different. 
The  speed  and  accuracy  of  these  simulators  cover  a  broad  range;  in  general. 
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switched-ievei  simulators  and  timing  verifiers  are  faster  than  the  circuit- 
orienttc  timing  simulators:  however,  switched-levei  simulators  and  timing 
verifiers  are  less  accurate.  Some  examples  of  timing  simulators  that  use 
approaches  similar  to  those  of  circuit  simulators  are  MOTIS  [5],  MOTIS-C  [6]  , 
MOTIS  II  [7],  SPLICE  [41]  and  PREMOS  [8].  To  reduce  the  computation  time  at 
each  time  point  a  one-sweep  Gauss-Jacobi  method  is  used  in  MOTIS  [5]  and  a 
one-sweep  Gauss-Seidel  approach  in  MOTIS-C  [b]  ;  i.e..  the  iteration  is  not  car¬ 
ried  out  until  convergence.  In  SPLICE  [41]  the  relaxation  method  is  applied  to 
the  nonlinear  difference  equations.  It  is  similar  to  MOTIS-C  [6]  except  that  the 
iterations  are  carried  out  until  convergence  or  until  the  number  of  iterations 
exceeds  some  predetermined  value.  In  the  latter  case  the  time  step  is  reduced 
and  the  calculation  is  repeated.  The  iterations  are  performed  to  achieve  accu- 
racv  and  convergence.  PREMOS  [8]  applies  a  Gauss-Seidel  method  similar  to 
the  one  used  in  MOTIS-C  [b],  except  that  the  unknown  variables  in  the  Gauss- 
Seidel  formulation  are  predicted  based  on  previous  values. 

Swiiched-leve!  simulators  are  somewhat  related  to  logic  simulators  in  that 
they  use  levels  defined  as  0  ,  1  and  X  ( X  is  the  undefined  or  unknown  level}. 
Nodes  in  a  circuit  are  assigned  strengths  which  de'ermine  if  the  nodes  can  affect 
or  rx  affected  by  other  nodes.  Each  transistor  in  the  circuit  is  assigned  a  sta’e. 
During  the  analysis  the  states  of  the  transistors  are  first  heid  fixed  and  the 
nodes  are  updated:  the  transistor  states  are  then  modified  while  the  node  states 
are  kept  fixed.  An  example  of  simulators  that  use  this  procedure  is  MOSSIM  [3]. 
Another  approach  to  switched-levei  simulation  ;s  presented  in  [4]  and  is  implc- 
ended  :.o  the  simulator  EXPRESS.  The  method  relies  on  the  evaluation  of 
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symbolic  logic  expressions  which  are  generated  automatically  by  the  simulator. 
The  method  is  also  able  to  handle  faults  injected  into  the  circuits.  Another 
type  of  switched-level  simulator  is  MOSTIM  [10].  In  this  case,  the  third  level  X 
represents  a  state  that  is  above  a  chosen  low-threshold  level  and  below  anotner 
chosen  high-threshold  level.  Note  that  this  level  contains  timing  information 
while  the  X  state  of  the  other  switched-level  simulators  (MOSSIM  .  EXPRESS) 
only  represents  undefined  or  unknown  values,  which  means  that  the  X  level 
can  also  be  0  or  1.  In  MOSTIM,  delay  tables  for  a  basic  inverter  circuit  and  for 
an  inverter  with  transmission  gate  are  constructed  using  circuit  simulation  runs 
in  a  preprocessing  step.  Delay  information  is  then  extracted  from  these  runs. 
The  delays  of  nonstandard  primitives  are  obtained  from  the  tables  by  using 
scaling  of  existing  primitives.  The  simulator  is  in  many  cases  over  two  orders  of 
magnitude  faster  than  SPICE,  and  the  X  level  provides  fairly  accurate  timing 
information.  One  drawback  is  that  the  tables  require  a  large  memory  space  and 
have  to  be  constructed  for  each  technology. 

Timing  verifiers,  on  the  other  hand,  determine  the  timing  of  critical  paths 
in  a  circuit.  Timing  verifiers  use  methods  that  are  signal-value  independent. 
However,  timing  verifiers  may  report  false  critical  paths,  or  paths  that  are 
never  activated  in  reality.  To  handle  this  weakness  some  mechanisms  are 
incorporated  by  the  timing  verification  programs.  Two  examples  of  timing 
verifiers  are  Crystal  [14]  and  TV  [15].  The  difference  between  the  two  is  that 
Crystal  employs  a  depth-first  search  in  determining  the  critical  paths,  while  TV 
uses  a  breadth-firs*  search.  The  timing  or  delay  calculation  of  the  critical  path 
is  based  on  approximating  the  transistors  by  linear  resistors  and  then  determm- 


ing  the  dynamics  of  the  resulting  RC'  network  based  on  some  RC  time  constant 
approximation,  such  as  the  one  suggested  by  Penheld  and  Rubinstein  [21]. 

In  this  study  a  new  method  for  fast  timing  simulation  based  on  piecewise 
linearized  transistor  models  is  developed.  The  method  has  computational  speed 
comparable  to  that  of  switch-level  timing  simulation,  and  at  the  same  time  pro¬ 
duces  waveforms  close  to  those  produced  by  standard  circuit  simulation.  The 
use  of  piecewise  linear  (  pwl  )  techniques  for  time-domain  analysis  of  electronic 
circuits  is  not  new  [32],  It  has  been  used  by  Hajj  and  Skelboe  in  [12].  where  the 
numerical  properties  of  implicit  integration  formulas  are  analyzed  when 
applied  to  the  solution  of  /nW  systems  without  partitioning.  In  [28]  Laplace 
transform  techniques  are  applied  to  compute  the  solution  in  the  linear  regions 
of  the  />v/  equations.  In  [13]  Kaye  and  Sangiovanni-Vincentelli  use  Laplace 
transforms  and  Gauss-Jacobi  method  to  compute  the  solutions  of  pvl  systems 
of  equations,  where  the  set  of  equations  is  partitioned  into  systems  of  scalar 
equations.  A  major  time-consuming  step  when  applying  the  Laplace  transform 
method  to  the  solution  of  pwl  equations  is  the  computation  of  the  intersection 
of  the  solution  trajectories  with  the  region  boundaries.  In  [37]  a  Gauss-Seidei 
technique  is  used  to  solve  p w/  circuits.  In  this  case  the  circuit  partitions  are 
hxed:  in  addition,  Gauss-Seidei  techniques  are  used  to  solve  the  pvl  equations 
within  each  partition.  The  method  can  thus  be  too  slow  when  strong  coupling 
exists  among  the  circuit  variables. 

More  recently  there  have  been  a  few  papers  dealing  with  methods  reiated 
:n  some  respects  to  pw!  techniques,  most  notably  Elogic  [16.17]  and  Cinnamon 
[18].  related  in  the  sense  that  linear  or  rW  transistor  models  are  used.  These 
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two  simulators  will  be  described  next. 


In  Elogic  [16.17]  the  transistor  model  used  is  the  small-signal  model 
linearized  at  the  operating  point,  or  line-thru-origin  model.  An  n-dimensional 
table  consisting  of  a  Norton  equivalent  circuit  for  each  output  node  as  a  func¬ 
tion  of  controlling  voltage  states  is  constructed.  Unlike  the  method  applied  in  a 
conventional  simulator.  Elogic  discretizes  the  voltage  level,  calculates  the  total 
conductance  and  total  current  at  each  node,  and  determines  the  time  when  the 
next  discretized  voltage  level  is  crossed.  The  time  increment  A  t  is  computed  as 
follows  : 

Ar  =  ( C  v  x  A  V )/  ( I  v  -( \  ’  v  xGx ) ) 

where  Cv  is  the  capacitor  at  node  N  ,  /  v  is  the  total  current  at  node  N  .  V’v  is 
the  voltage  at  node  N  at  the  present  time  point  ,  and  Gv  is  the  total  conduc¬ 
tance  obtained  from  the  table  for  node  N.  Only  transitions  between  adjacent 
states  are  allowed  by  Elogic.  Since  waveform  relaxation  iteration  is  not  carried 
out  untii  convergence.  Elogic  might  make  a  wrong  transition  to  a  new  voltage 
state.  The  solution  to  this  problem  is  to  use  small  voltage  steps.  A  better  ver¬ 
sion  (  Elogic2  '  which  applies  the  trapezoidal  method  for  discretizing  the  time 
derivative  and  solves  strongly  coupled  nodes  together  was  developed.  Solving 
stronglv  coupled  nodes  together  eliminates  the  nonconvergence  problem  of  the 
waveform  relaxation  method  as  applied  in  Elogic  1.  Since  it  is  more  expensive  to 
use  Eiog;c2,  the  program  is  used  during  analysis  only  when  Eiogicl  faiis. 

Cinnamon  uses  a  method  similar  to  the  one  used  in  Elogic  in  that  the  vol¬ 
tage  level  is  discretized.  However,  the  transistors  are  linearized  at  each  time  a 
discretized  vol’age  level  is  crossed  '  at  each  "event"  ),  rather  than  obtaining  the 
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transistor  model  information  from  tables.  The  time  when  a  voltage  level  is 
crossed  is  determined  by  approximating  the  solution  obtained  using  the  Laplace 
transform  method.  The  approximation  is  that  if  the  amplitude  of  the  exponen¬ 
tial  term  corresponding  to  the  smallest  (absolute  value)  of  the  system  eigen¬ 
values  is  smaller  than  the  voltage  step  AV.  then  this  term  is  the  dominating 
term  of  the  solution.  This  method  of  solution  gives  more  accurate  results  than 
the  approach  of  Elogic.  but  the  use  of  the  Laplace  transform  method  could  slow 
down  the  solution  process. 

There  are  three  pwl  approximation  methods  described  in  this  study.  The 
three  methods  construct  pwl  models  at  the  outset  in  a  preprocessing  step  of  the 
simulation  -  as  is  done  in  Hajj  and  Skelboe  [12]  and  Kaye  and  Sangiovar.ni  [  1 3], 
so  it  is  not  necessary  to  linearize  frequently  as  is  done  in  Cinnamon.  Compared 
to  the  tables  for  the  transistor  models  used  in  Elogic.  the  table  sizes  in  our 
approach  is  smaller,  since  the  tables  are  one-dimensional,  and  fewer  breakpoints 
are  needed. 

The  first  method  is  a  modification  of  the  Chien  and  Kuh  method  of  per¬ 
forming  /nc/  analysis  on  simplices  [40].  In  the  original  method  there  is  no 
implication  of  piecewise  linearizing  the  network  elements,  but  rather  the 
method  is  applied  to  general  />v/  functions.  In  our  case  both  the  network  ele¬ 
ments  and  the  solution  curve  are  piecewise  linearized.  There  are  some  advan¬ 
tages  to  using  this  method.  It  is  simpler  than  the  more  common  Katzenelson 
method  [29],  in  that  there  is  no  need  to  explicitly  calculate  the  boundary  cross¬ 
ings  when  tne  solution  curve  enters  a  new  /n»7  region.  Moreover,  there  is  no 
need  for  the  function  to  have  a  derivative;  that  is,  it  is  not  necessary  to  con- 
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struct  the  Jacobian  matrix  as  is  done  in  the  Newton-Raphson  method.  In  fact, 
the  function  describing  the  device  characteristics  need  not  be  known.  Only  data 
points  on  the  current-voltage  characteristic  curves  are  needed.  In  addition,  the 
convergence  of  the  method  is  the  same  as  that  of  Katzenelson’s  method. 

The  second  method  combines  a  fast  />v/  method  and  the  waveform  relaxa¬ 
tion  approach.  This  method  is  based  on  the  work  of  Hajj  and  Jung  [39].  The 
idea  is  to  partition  the  system  into  a  set  of  scaiar  pwl  dynamic  equations,  solve 
each  equation  by  inspection,  and  iterate  using  the  Gauss-Seidel  waveform  relax¬ 
ation  approach  until  convergence.  It  is  found,  however,  that  for  strongly- 
coupled  nodes  the  method  proposed  in  [39]  converges  very  slowly.  Modification 
to  the  original  method  is  described  in  the  next  chapter. 

The  third  method  is  a  completely  novel  one,  which  dynamically  partitions 
the  network  during  the  analysis  so  that  the  resulting  linear  matrix  representing 
the  piecewise  linearized  circuit  is  as  block-diagonal  as  possible.  The  dynamic 
partitioning  involves  the  comparison  of  integers  representing  regions  of  transis¬ 
tor  operation.  Fast  computation  speed  without  much  loss  of  accuracy  has  been 
obtained  using  the  third  approach.  Another  good  feature  of  the  method  is  the 
inherent  parallelism  of  the  block-diagonal  form  as  a  result  of  the  dynamic  par¬ 
titioning.  and  thus  parallel  processing  can  be  efficiently  used. 

The  pwl  transistor  model  and  the  first  and  second  pwl  methods  mentioned 


I'M 

V  V 


above,  namely,  the  />v/  method  on  simplices  and  the  Gauss-Seidel  p^l  \VR 
approach  are  explained  in  Chapter  2.  Chapter  3  is  devoted  to  dynamic  parti¬ 
tioning  methods.  An  implementation  of  the  dynamic  partitioning  method  on 
parallel  nrocessors  is  described  in  Chapter  4.  The  implementation  of  the 
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approaches  for  sequential  and  for  parallel  machines  and  some  examples  are 
given  in  Chapter  5.  Conclusion  and  suggestions  for  future  works  are  described 
in  the  6nal  chapter.  Modification  to  the  p^i  transistor  model  to  incorporate 
short  channel  effects  is  described  in  Appendix  A.  A  brief  description  of  the  pro¬ 
gram  PLATINUM,  which  is  an  implementation  of  the  dynamic  partitioning 
approach,  is  given  in  Appendix  B. 
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CHAPTER  2 


PIECEWISE  LINEAR  SOLUTION  METHODS 


2.1.  Introduction 

Simulating  entire  VLSI  circuits  using  standard  circuit  simulation  programs 
such  as  SPICE  is  very  time-consuming,  due  to  the  large  size  of  the  circuit,  /he/ 
methods  could  be  attractive  because  they  simplify  nonlinear  model  representa¬ 
tion.  and  therefore,  would  reduce  model  evaluation  time  considerably.  In  addi¬ 
tion,  some  pwl  methods  offer  better  convergence  properties  as  compared  to  the 
standard  Newton-Raphson  method  used  in  standard  circuit  simulators  such  as 
SPICE.  In  this  Chapter  2  /nW  methods,  together  with  their  advantages  and 
drawbacks,  are  explained. 

The  currents  and  voltages  in  a  circuit  are  governed  by  the  following  equa¬ 
tions  : 


•?, 

( A CL  )  .4  \b  =  0 

(2.1. a) 

■  • 

(AVI)  v  .  =  .Af  vn 

(2.1.  b) 

( resistors  )  j  =  /  _  .  ( vn  L  ;r  , ) 

(2.1.c) 

/. 

(resistors  )  v,  .  =  /  r , ,( vn  j  ir  , ) 

(2.  l.d) 

y*. 

y 

(capacitors  )  j.  —  /C(v  )  ,  i .  =  - 

4  *c/CC  v 

( 2.1  .e1 

tv 
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{inductors  )  0  =J/(il).  %,  =d -  (2.1.f) 

dt 

where  i6  is  the  set  of  currents  in  the  b  branches  of  the  circuit,  v  b  is  the  set  of 
voltages  across  the  branches,  v  n  is  the  set  of  n  node-to-datum  voltages,  and  A 
is  an  n  x  b  reduced  incidence  matrix  which  contains  -1,-1  and  0  entries.  vn, 
and  ir  ,  are  voltages  and  currents  across  the  resistors,  qc  is  the  charge  of  the 
capacitors,  v  is  the  voltage  across  the  capacitors,  4>;  is  the  fiux  of  the  inductors, 
and  i,  is  the  current  through  the  inductors.  The  tableau  equations  in  (2)  may 
be  reduced  to  a  smaller  set  using,  for  example,  the  modified  nodal  approach 
[46]. 

Since  the  work  presented  here  is  based  on  piecewise  linearization  of  the 
nonlinear  elements  in  the  circuit,  p\d  modeling  of  nonlinear  elements, 
represented  by  f  nV  f  rZ.  f  c  and  / 1  in  (2.1).  will  be  explained  in  the  following 
sections.  Note  that  the  functions  in  (2.1)  include  linear  elements  and  indepen¬ 
dent  sources.  These  elements,  of  course,  need  not  be  piecewise  linearized. 

2.2.  Piecewise  Linear  Modeling 

2.2.1.  Two  terminal  elements 

The  psd  approximation  of  the  nonlinear  characteristic  of  a  2-terminal  ele¬ 
ment  is  shown  in  Figure  3.  In  this  case  the  />v/  curve  is  characterized  by  a  set 
of  breakpoints.  The  breakpoints  define  region  boundaries.  In  each  region  the 
equation  is  as  follows: 

y  =  j  x  +  ht 

where  *he  subscript  i  indicates  the  region  number.  The  number  of  breakpoints 


and  their  locations  determine  the  accuracy  of  the  /rwv  approximation  with 
respect  to  the  original  function. 

2.2.2.  Multiterminal  elements 

In  n-dimensional  space  the  boundary  between  two  regions  is  an  C n- 1  )- 
dimensional  hyperplane.  In  each  region  the  /wZ  function  is  of  the  form 

f  (  x  )  =  J.  x  +  w  ;  =  y 

where  J.  is  a  constant  matrix  and  w  (  is  a  constant  vector.  J,  and  w  .  are 
defined  in  each  pwl  region. 

Modeling  of  multiterminal,  nonlinear  elements  by  pwZ  functions  in  gen¬ 
eral  requires  multidimensional  tables.  However,  if  the  functions  of  several 
variables  representing  the  terminal  characteristics  of  a  multiterminal  element 
can  be  expressed  as  the  sum  of  single- variable  functions,  or  the  sum  of  nested 
functions,  then  the  pvZ  representation  can  be  expressed  in  terms  of  a  set  of 
one-dimensional  tables.  This  would  save  both  storage  and  computation.  In  gen¬ 
eral,  however,  such  a  model  decomposition  is  not  necessary,  since  one  can  use 
simplices,  as  will  be  described  in  section  2.4.  In  the  next  section,  we  show  how 
three-'erminal  elements,  such  as  an  MOS  transistor,  can  be  decomposed  into  an 
interconnection  of  two-terminal  elements.  Then  each  of  the  two-terminal  ele¬ 
ments  is  piecewise  linearized.  Each  two-terminal  pwZ  model  can  then  be  stored 
in  a  one-dimensional  table. 


2.2.3.  Piecewise  linear  transistor  model 


The  well-known  simple  equations  of  the  channel  current  of  an  MOS 
transistor  is  as  follow  [38]: 

Linear  region: 


^  ds  —  A  (2(1  cs  V  T  H  DS  \  DS)  :  0  ^  \  Ds  ^  V  cs  I  T 
Saturation  region: 

^ DS  =  ^  GS~\  T  )  '  0  ^  \  cs~^r  ^  ^  DS 

K  =  ^X\V  1 2toxL 


(2.2.a) 

(2.2. b) 


ix  —  average  surface  mobility  of  carriers  in  the  channel  of  the  device 
ecx  =  permittivity  of  the  oxide 
tcx  =»  thickness  of  oxide  under  gate 
L  =  length  of  the  channel 
W  =  width  of  the  channel 

The  \'cs ,  VDS .  and  \ 'r  are  gate-to-source,  drain-to-source.  and  threshold  vol¬ 
tage.  respectively.  The  terms  K  and  \'T  in  the  above  equations  are  considered  to 
be  constants.  It  is  clear  that  a  />v7  approximation  of  (2.2.a)  requires  the  genera¬ 
tion  of  a  two  dimensional  table  with  \'GS —\'T  and  VDS  as  independent  vari¬ 
ables.  Although  interpolation  on  two  or  higher  dimensional  tables  is  feasible,  it 
:s  much  more  efficient  from  the  computational  and  storage  points  of  view  to 
have  a  one-dimensiona!  tabular  representation.  Meyer  [25]  proposed  the  fol¬ 
lowing  model  which  transforms  (2.2)  into  sums  of  functions  of  a  single  vari¬ 
able  each  : 


=  A  /('cc)-A’  /(V 


=  ^1-^2 

G.\  ^  r  *  ror  ^  CX  ^  '  T 


where  /(\'GY)  = 


(2.3.b) 


forVcx<VV 


The  model  depicting  Equation  (2.3)  is  shown  in  Figure  1.  The  model  can  be 
transformed  into  an  ’Ebers-Molls-type'  model  as  shown  in  Figure  2.  The  a  's  in 
Figure  2  are  equal  to  unity  to  keep  7  =  0. 

The  next  step  is  to  approximate  the  quadratic  equations  in  ( 2.3. b )  by  pwl 
functions.  An  example  of  a  graph  of  7  vs  l’GV  and  its  piecewise  linearized 
representation  is  shown  in  Figure  3.  For  timing  analysis  a  three-segment  model 
has  been  found  to  be  adequate  for  providing  acceptable  accuracy.  The  resulting 
circuit,  depicted  in  Figure  4.  consists  of  a  conductance  and  a  current  source, 
where  the  value  of  the  conductance  is  the  slope  of  the  linear  function  in  a  seg¬ 
ment  and  the  current  source  is  the  intercept  of  the  function  with  y  axis  (the  I 
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Using  an  implicit  integration  formula,  such  as  the  backward  Euler  for¬ 
mula.  to  approximate  the  time  derivatives  in  (2.1e)  and  (2. If),  the  resulting 
pwl  circuit  equations  at  time  r..  are  of  the  form 

x  "  )  =  0  (2.4) 

where  x  couid  be  the  modified  nodal  equation  variables,  and  x  71  the  value  of 

x  at  time  r_.  Equation  (2.4)  is  usually  solved  by  using  Newton’s  method.  At 
every  iteration  in  Newton's  method,  the  linearized  equations  are  of  the  form: 


StSSS 

r  .X  \ 


?  V 

.»  V  .-  . 

A  " 

-  ‘V  « \ 


V  v  V 
v  .  -x; 


Ax  =h  (2.5) 

A  number  of  iterations  may  be  necessary  before  the  process  converges,  provided 
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Fig.  1  Transistor  model  from  Equation  (2.2) 
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F:g-  2  "Ebers-Moiis  type"  model  of  an  MOS  transistor 
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it  does  converge.  The  matrix  A,  which  is  usually  sparse  in  circuit  analysis,  is 


solved  by  sparse  matrix  solution  methods  to  reduce  the  computational  burden. 

A  modified  Newton’s  method  for  pv;  equations,  known  as  the 
Katzenelson's  method,  guarantees  convergence.  In  Katzenelson's  method  the 


next  iteration  point  is  chosen  to  be  the  intersection  of  the  solution  trajectory 
with  the  boundary  hyperplane  unless  the  solution  is  found  within  the  region.  A 
drawback  of  Katzenelson's  method  is  the  time  it  needs  to  determine  boundary 


crossings.  A  variant  of  Katzenelson’s  method,  the  pv/  method  on  simplices. 
finds  the  boundary  crossings  in  a  simple  and  more  efficient  way.  The  pv/ 
method  on  simplices  is  explained  next. 


2.3.  Piecewise  Linear  Approach  on  Simplices 


This  method  was  first  proposed  by  Chien  and  Kuh  [40].  It  is  conceptually 
similar  to  the  well-known  Katzenelson  method.  The  advantages  of  this  method 


are  as  f cliows: 


1 .  There  is  no  need  to  determine  boundary  crossings  as  is  done  in  the 
Katzenelson  method.  Instead,  a  vertex  replacement  is  performed  on  simplices. 

2.  There  is  no  need  to  calculate  the  Jacobian  matrix  as  is  required  in 

the  Newton-Raphson  formula.  In  this  sense  the  method  is  more  general  since 
a  junction  which  does  not  have  a  derivative  can  st.ll  be  solved. 

3.  The  functions  describing  the  current -voltage  anc  charge- voitage 
charac'eristics  need  not  be  known.  Sample  points  on  the  multidimensional 
tharac 'ensues  are  sufficient  lor  the  computation.  This  implies  'hat 
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new  devices  based  on  new  Technologies  can  be  studied  without  having 
the  function  governing  the  operation  of  the  device  derived. 

In  the  next  few  paragraphs  the  definition  of  a  simplex  is  reviewed.  This  is 
followed  by  a  description  of  the  Chien  and  Kuh  method.  Finally,  an  algorithm 
based  on  the  approach  is  presented. 

Let  x0.  •••  ,x„  eR  .  A  simplex,  known  also  as  a  closed  convex  hull.  S 
(  x0,  •  ■  •  ,x.j  )  is  defined  by 

5(  x  0.  •  •  •  .  x  n  )  = 

-  n 

x  e  I  x  =  Y.Ri  x  ,  .0  ^  1  .i  —  0. 1 .2 . n  and  £  m.  =  1 

1=0  : =0 

x0,  •  •  •  ,xr  are  called  the  vertices  of  the  simplex  S  (  x0.  •  •  •  ,x„  ).  A  simplex 
S  (  x0.  •  •  •  ,x7  )  is  called  proper  if  and  only  if  the  >  n- 1 )  x  ( n- 1!  matrix 

x  o  •  •  •  x  r. 

1  .  .  .  1 

is  nonsingular. 

The  boundary  Hk  corresponding  to  the  vertex  \k  is  defined  as 

x  ci?  l  x  =  x 

l  **'< 

As  will  be  explained  later  in  the  chanter,  this  definition  of  boundary  is 
very  useful  in  de'ermining  where  the  solution  curve  should  go.  Due  to  the  fact 
that  there  is  a  one-to-one  correspondence  between  a  boundary  and  a  vertex, 
instead  of  determining  which  boundary  is  to  bt  crossed  by  the  solution  curve, 
the  corresponding  vertex  to  be  removed  is  de’ermir.ed.  This  vertex  removal 
turns  out  to  be  simpler  in  calculation  and  programming  than  finding  the 
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boundary  crossing.  Reference  [40]  contains  a  complete  explanation  and  deriva¬ 
tion  of  the  method.  The  following  paragraphs  describe  the  idea  and  the  algo¬ 
rithm. 

Let  the  original  f unction  to  be  piecewise  linearized  be  f  (  x  )  =  y  where 
f  (.)  :/?"—»  A  function  g  (.)  approximating  the  original  function  f  (.)  on 
•  ,x^  )  is  defined  by 


5  i  x0. 


g  (  x  )  =  [  f  (  x0  ) 
for  S(  x0.  •••  ,xn  )  and  fi  =  [/j.0,/jlv  • 


•  •  •  .  f  (  Xn  )]  /A 

■•An]7  defined  previously  for  the 


.x,  ). 


representation  of  x  eS(x0,  • 

In  summary,  the  representation  of  a  point  x  in  a  simplex  S  (  x0,  •  •  •  x., 
)  and  the  pwl  f  unction  g  (  x  )  are  as  follows  : 

x  eS(  x0. 


•x*  > 


’  1 

i 

x 

*0  •  •  • 

X, 

: 

• 

1 

1  .  .  . 

1 

A 

!  S 

* 

g  (  x  ) 

f  (  xn )  .  .  .  f  (  \r  y 

i  £■ 

1 

i  ...  i 

A 

Once  the  boundary  to  be  crossed  is  identified,  then  one  needs  to  determine 
the  new  simplex  entered.  Since  corner  crossing  is  not  allowed,  ail  vertices, 
except  one.  remain  the  same.  It  is  shown  in  [40]  that  the  new  vertex  x,  is  the 
combination  of  the  old  value  x  k  and  two  of  its  adjacent  vertices:  that  is, 

x,  =  x  +  x  .  ,  —  x  . 
where  k  indicates  the  position  of  the  altered  vertex. 
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The  solution  algorithm  given  below  is  a  slightly  modified  version  of  the 
one  described  in  [40]. 

Step  1  : 

Choose 

x  0  and  x  (  =  x  ( _,+Et  ,  2  =  1.2 . n  where  E;  =  [0 . 0.e(  .0 . 0]7 

and  e,  >  0  is  the  i  th  component  of  £. 


Step  2  : 


Step  3  : 


Step  4  : 


Let  fi°  =  [1 . l]7/(?t  +1 )  ;  that  is,  x  °  =  - Y  x  ,  (  x  °  is  the 

("+U,-o 

center  of  the  initial  simplex  ). 

Set  i=0 


Compute  fL  according  to  the  equation 


fix,)  .  .  .  fix,) 

l 

y 

1  ...  1 

p  = 

l 

If  every  component  of  fi  is  non-negative,  a  solution  is  found 
x  =  [  x  0 . x  n  ]fi  .  STOP 

Otherwise,  compute  A'  from 

(i  (r  )  =  fj.  +  sen  ( /U.. ^  ) 

such  that 

i  )  0  <  fx  (r  1  ^  1  for  0  ^  t  ^  a 

ii)  there  exists  one  and  only  one  index  k  satisfying  /j.  (A'  )■.  =  0 
in  )  1  >  a  (a'  ?.  >  0  tor  ;  =*A 
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In  practice  : 

Find  minimum  t  (  train  )  from 
0  =  fj.k  +  t  {/x'k  —fj-l )  ;  iff/u*  ) <0 

-or  -  from 

0  =  fj.1  —t  )  ;  if(£t'  —  jj.lk  )>0 

Then  calculate 

fi  {tram  )  =  fi '  ±  train  (  fj.  —fi  ) 


Step  5  : 

Replace  xk  by  (  x  A+1  +  x  k_l  -  x  k ) 


Let  i=i-*-l  and  go  to  Step  3. 

We  found  that  the  method  is  slow  in  analyzing  circuits.  To  reduce  the 
computation  time  the  nonlinear  network  elements  are  piecewise  linearized  and 
tabulated.  As  a  result,  we  piecewise  linearize  two  things:  one  is  the  network 
elements  and  second  is  the  solution  space  which  becomes  the  space  of  simplices. 
The  piecewise  linearization  ol  the  network  elements  is  not  proposed  in  the  origi¬ 
nal  idea  given  in  [40],  A  variable  time-step  method  described  m  Wei  s  thesis 
[43]  is  used.  A  10-stage  chain  of  inverters  analyzed  using  this  method  requires 
about  20  seconds  of  CPU  time  while  SPICE  needs  about  13  seconds. 

Parallel  implementation  of  the  method  could  reduce  the  computation  time. 
Each  vertex  of  the  simplex  consists  of  a  se,  of  numbers  representing  a  set  of 
\  oltages.  For  example,  an  mverier  with  a  pass  transistor  is  represented  by  a 
simplex  with  3  vertices.  Each  vertex  consists  of  2  numbers,  representing  the  2 
voltages  m  the  circuit.  Each  vertex,  which  is  a  column  in  the  matrix  of  step  3 
of  'he  algorithm,  can  be  solved  in  parallel.  Becaase  each  vertex  provides  a 
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complete  set  of  voltages,  the  entries  of  the  corresponding  column  can  be  com¬ 
puted  concurrently.  This  is  performed  until  all  the  columns  of  the  matrix  are 
calculated. 

The  results  of  the  implementation  of  the  method  on  the  Alhanl  FX/B.  a 
vector-parallel  computer  with  shared  memory,  were  found  to  be  discouraging. 
The  speedup  was  only  a  factor  of  less  than  two  as  compared  to  SPICE. 
Although  parallelization  of  the  matrix  entries  is  possible,  the  resulting  matrix  is 
dense,  and  therefore,  no  sparse  matrix  technique  can  be  applied  to  reduce  com¬ 
putation.  As  the  circuit  becomes  larger,  the  calculation  of  the  dense  matrix 
could  become  prohibitive. 


2.4.  Relaxation  Methods 

The  circuit  analysis  method  described  so  far  solves  (2.4)  as  well  as  (2.5) 
directly:  i.e..  no  relaxation  is  used.  Alternatively,  relaxation  techniques  could 
be  used  to  solve  (2.5)  (c.g..  linear  Gauss-Seidel  or  Gauss-Jacobi)  or  (2.4)  (  non¬ 
linear  Gauss-Seidel  or  Gauss-Jacobi  ).  In  these  methods  the  time  step  is  con- 
1  rolled  at  the  global  circuit  level,  and  thus  are  referred  to  as  pointwise  relaxa- 
Tion  methods.  The  pointwise  Gauss-Seidel  method  of  solving  (2.4)  is  as  lollows: 

repeat  I  foreach  <  j  in  A  1 

solve  c,(.\  ;  1 . x '  ‘ . t  y  )  =  0  for  .v)*1  :  }  }  (2.6) 

until  (  1 1  x  *■  x  '  1 1  ^  c) 

The  f  oreach  implies  that  the  computation  for  each  value  j  in  the  ordered  set  .V 
must  proceed  sequentially  and  m  the  order  specified  by  the  set. 
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The  pointwise  Gauss-Jacobi  meihod  of  solving  (2.4)  is 


repeat  \  forall  <  ;  in  V 


so/ve  g .( x  * . x*  +1 . v'v)  =  0  f or  x' 


t +  i 


(2.7: 


until  ( I  I  x  '  —  x  '  1 1  ^  e) 

The  forall  implies  that  the  computation  for  all  values  ol  j  in  the  ordered  set  \ 
may  proceed  concurrently,  i.e..  in  parallel  and  in  any  order. 

Relaxation  techniques  can  also  be  applied  at  the  differentia!  equation  level; 
i.e.,  each  subcircuit  can  be  solved  using  its  own  time  step.  The  Gauss-Seidel 
waveform  relaxation  method  of  solving  a  system  of  nonlinear  differential  equa¬ 
tions  of  the  form 


x  =  f  (  x  j  ) 


IS 


■  ^  —  1  el 
X  =  f  .(  X 


-1  n  +  1 


X  ,_i  .  X 


.  X  . 
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(2.6) 


(2.9) 


X  ;*1(0)=  x  "( 0) 

while  Gauss-Jacobi  waveform  relaxation  meihod  is 


X.  =  f  .(  X  J . X  X  .  X 


•1, 


-r 


xZ) 


(2.10) 


x  ,  (0)  =  x  .'(O.i 


The  vector  x  .  in  the  above  equation  corresponds  to  the  variables  in  the  subcir¬ 
cuit  i.  In  the  waveform  relaxation  method  the  subarcui*  variables  are  solved 
for  a  time  window  T.  In  MOS  circuits,  subcircuits  are  of'en  obtained  by  parti¬ 
tioning  the  circuit  into  at -connected  components.  Floating  capacitors  such  as 
gate-source  and  gate-dram  capacitors  cause  local  feedback  among  s  rxi'cui’s.  In 
timing  analysis  these  small  Hoa-ini*  capacitors  are  repland  bv  equixaltnt  capa¬ 
citance4'  troin  *he  r.c>des  to  the  ground  [27].  As  a  result,  'he  lota.  : ted b<u>;  paths 
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among  the  subcircuits  caused  by  the  floating  capacitors  are  eliminated.  When 
applying  the  Gauss-Seidel  method,  sequencing  the  subcircuits  for  analysis  could 
reduce  the  computation  time.  In  the  following  subsection,  sequencing  of  the 
subcircuits  for  analysis  is  described. 

2.5.  Analysis  Sequencing 

Analysis  sequencing  is  applied  after  the  circuit  is  partitioned.  While  parti¬ 
tioning  and  sequencing  involve  some  overhead,  the  overall  result  is  a  reduced 
computation  time.  The  idea  is  that  if  it  is  possible  to  partition  the  circuit  into 
"one-way”  subcircuits,  then  only  one  sweep  of  Gauss-Seidel  analysis  is  needed 
for  solving  the  circuit. 

A  circuit  which  has  been  partitioned  inio  dc -connected  subcircuits  can  be 
represented  by  a  directed  graph  G(V.E!  where  V  is  a  set  of  vertices  representing 
subcircuits  and  E  is  a  set  of  edges  depicting  signal  lines  from  fanout  to  fani.n. 
In  the  circuit,  an  edge  e  e  E  with  an  arrow  from  vertex  x  to  vertex  y  is  the 
result  of  dependent  current  sources  due  to  MOS  transistors  in  subcircuit  V.  The 
following  definitions  about  graphs  will  be  used  in  the  description  of  the 
sequencing  algorithms. 

Definition  A  : 

Giver,  a  vertex  v  of  G(V.E).  the  set  ol  fanin  vertices  and  fanout  vertices  of  ver¬ 
tex  v 

Ci  Tt' 

hr.t  v  1=  t  w  €  V  i  w.v)  (  E  1 
!  ('ii t (  v  =  {  w  *  \  !  ( v.w )  t  1.  | 
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where  (x.y)  denotes  an  edge  from  vertex  x  to  vertex  y.  The  number  of  famn 
and  fanout  vertices  of  v  are  denoted  by  nfinfvj  and  nfout(v).  respectively.  In 
the  following  definitions  and  theorems  we  will  consider  ordering  or  sequencing 
the  vertices  of  graph  G(V,E)  when  the  graph  G(V.E)  does  not  contain  any  feed¬ 
back.  This  case  arises  in  combinational  circuits  consisting  of  simple  transistor 
models. 

Definition  B  : 

V  ertex  v,  in  G(V,E)  is  a  predecessor  of  vertex  v  if  and  only  if  there  is  a 
directed  edge  from  v.  v.  . 

If  v.  is  a  predecessor  of  v, .  then  v  is  a  successor  of  v. . 

Definition  C  : 

A  linear  ordering  or  sequencing  is  called  a  topological  order  if  for  every  prede¬ 
cessor  v.  of  v.  in  the  graph  G(V.E),  the  v;  precedes  v,  in  the  linear  ordering. 

Theorem  a  [3 1  ]  : 

The  vertices  in  a  directed  graph  can  be  arranged  in  a  topological  order  if  and 
only  if  the  directed  graph  is  acyclic. 

The  theorem  implies  that  for  any  combinational  circuit  the  graph 
representing  the  circuit  is  acyclic  and.  therefore,  the  subcircuits  can  be  arranged 
in  the  topological  orders.  One  realizes  that  many  circuits  contain  feedbacks, 
anti  therefore,  the  corresponding  graph  is  cyclic.  The  parts  of  the  graph  that 
contain  feedback  edges  (  known  as  'he  strongly  connected  component  or  set  ) 
arc  detected  using  Depth-First  Starch  Techniques.  Each  strongly  connected 
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component  is  replaced  by  one  new  node.  After  the  replacement  the  resulting 
new  graph  G'  is  acyclic  and  the  sequencing  method  for  an  acyclic  graph  car.  be 
applied.  The  Tarjan's  Depth-First  Search  algorithm  [24]  to  find  strongly  con 
nected  component  is  as  follows  : 

Step  1  : 

( initialization  step)  Mark  ail  the  edges  “unused."  For  every  v  e  V  let 
k(v)  «—  0  and  f(v)  be  undefined,  j  ffv'1  =  father  of  v  [. 

Empty  Si  S  is  a  stack  that  stores  the  vertices  in  the  order  in  which 
they  are  discovered  }. 

Let  i  «—  0  and  v  =  s  {  s  is  the  0  node  or  source  node  [. 


Step  2  : 


Step  3  : 


S*ep  4  : 


Step  5  : 


Step  b  : 


•  4 —  i  —  l  ,  k(  v) «—  i  ,  L(  v )  =  i  and  put  v  on  S. 

If  there  are  no  “unused"  incident  edges  from  v,  go  to  Step  6. 

Choose  an  "unused"  edge  v  — .  u.  Mark  the  edge  e  "used." 

(it  If  k(  u  )=0  .  then  f(u)=v,  v  «_  u  .  Go  to  Step  2. 

I  ii )  If  k(  u )  >  k(v)  (  e  is  ?  orward  edge  ).  Go  to  Step  3. 

(in)  If  kfu)  <  k(v)  and  if  u  is  not  on  S  (  u  and  v  do  not  belong  to 
the  same  component  ).  Go  to  Step  3. 

(iv)  If  kfu)  <  kfv)  and  if  both  vertices  are  in  the  same  component 
( that  is.  u  is  in  S).  let  L!  v)  =  min  1  L( v).  kfu)  }  and  go  to  Step  3. 


Step  7  : 


Step  8 


If  L(  v)=kl  v).  delete  all  the  vertices  from  S  down  to  and  including  v; 
these  vertices  lorm  a  strongly  connected  component. 


(i)  If  f(v)  is  defined,  then  LC f ( v ) )  «—  minIL(f(v),L(v)}.  v  4—  f(v) 
and  go  to  Step  3  ; 

(ii)  If  f(v  )  is  undenned  and  if  there  is  a  vertex  u  for  which  k(u)*=0, 
then  let  v  <—  u  and  go  to  Step  2. 


If  all  vertices  have  been  traced  then  STOP. 


After  the  strongly  connected  components  have  been  identi&ed  using  the 
above  algorithm,  and  each  see  is  replaced  by  one  node,  the  resulting  acyclic 
graph  G'  is  levehzed  using  the  following  algorithm. 

Algorithm  [44]  (assign  level  to  each  vertex  in  O'). 

BEGIN 

Assign  input  vertices  of  the  acyclic  graph  G'  to  level  0  ;  k  «—  0 
L.  FOR  each  vertex  v  in  level  k  DO 
For  each  vertex  w  e  fout(v)  DO 
BEGIN 

nfm(  w  )  —  nfin(  w )-] 

IF  nfin(  w  ;=Q  THEN 
assign  w  to  level  k-1  : 


IF  level  k  is  not  empty  THEN 


so 


Go  to  L  : 
k  —  k-1 
END 

After  the  subcircuits  are  assigned  levels  using  the  above  algorithm,  they  are 
analyzed  starting  from  subcircuits  connected  to  inputs  (level  lj  to  the  ones 
connected  to  the  outputs. 

Algorithm  3 
BEGIN 
k  «—  1  : 

L.  FOR  each  vertex  v  of  G’  at  level  k  DO 

time  domain  analysis  of  corresponding  subcircuits  ; 

IF  level  k  is  not  empty  THEN 
GO  TO  L  ; 
k  «-  k-1  : 

END 

En  many  cases  in  digital  circuits  only  some  portion  of  the  output  nodes  are 
of  interest.  Each  one  of  these  nodes  is  sometimes  affected  by  only  a  small  por¬ 
tion  of  the  subcircuits.  This  implies  that  only  some  subcircuits  are  needed  to  be 
analyzed  even  if  in  reality  the  rest  of  the  circuits  are  active.  Since  only  some 
parts  ot  the  system  are  analyzed,  the  computation  time  is  reduced.  The  method 
applied  to  take  advantage  of  this  fact  is  known  in  other  areas  as  "back- 
chaining."  Basically  starting  at  the  duck  end  ol  the  graph  (  that  is  the  output 


nodes  of  interest  )  one  traces  back  until  reaching  the  from  end  of  the  graph 
{  the  input  nodes  ).  The  vertices  traced  during  the  process  are  the  subcircuits 
that  need  to  be  analyzed.  A  typical  algorithm  that  performs  this  back-chaining 
task  is  given  in  [43]. 

There  are  three  ways  of  solving  the  strongly  connected  components.  The 
obvious  one  is  to  solve  them  as  one  block.  The  problem  with  this  approach  is 
that  the  block  might  be  too  large.  For  instance,  when  the  feedback  connections 
are  from  subcircuits  at  the  back  end  to  the  ones  at  the  front  end  then  the 
strongly  connected  components  are  practically  the  entire  circuit,  and  if  the  cir¬ 
cuit  is  large  then  the  block  may  be  too  large  to  be  solved  at  once.  A  better  way 
is  to  apply  a  dynamic  partitioning  method,  which  will  be  described  in  the  next 
chapter.  The  third  solution  involves  breaking  the  strongly  connected  com¬ 
ponent  into  even  smaller  subcircuits.  This  is  done  by  removing  some  edges  from 
the  see  so  that  the  original  see  becomes  acyclic  and  then  apply  a  relaxation- 
based  solution  method  to  the  see  that  has  become  acyclic. 

Having  described  the  Gauss-Seidel  waveform  relaxation  method  for  circuit 
analysis  and  the  piecewise  linearization  of  transistor  models,  the  fast  pwi 
method  will  now  be  defined. 

2.6.  Fast  Piecewise  Linear  Approach 

Consider  a  circuit  or  a  system  described  by  pm!  continuous  equations  of 


x  =  f  (  xlf  i.ylf  i)  =  .4.  xli  It  v.  _  -r  y  i  ",  .  r:  =  1 ,2 . /  (2.11  ) 

w.nere  x(.l.  y  (.)  :  [O.T]  — »  R  where  R  is  divided  by  h\ perplar.es  into  r 
polyhedral  regions.  is  a  constant  nxn  matrix  and  w  _  a  constani  n  vector 
defined  for  each  region  in. 

Kaye  and  Sangiovanni-Vincentelli  [13]  use  Laplace  transforms  and  the 
(javss-Jacobi  method  to  compute  the  solutions  of  the  pwl  systems  of  equations. 
The  set  of  equations  is  partitioned  into  systems  of  scalar  equations.  A  draw¬ 
back  to  applying  the  Laplace  transform  method  to  the  solution  of  pwl  equations 
is  the  time-consuming  effort  of  computing  the  intersection  of  the  solution  tra¬ 
jectories  with  the  region  boundaries.  The  method  presented  here  is  based  on  the 
work  by  Jung  and  Hajj  [39].  It  combines  the  waveform  relaxation  method  [l  l] 
and  the  Gauss-Seidei  iterative  method  to  solve  the  piecewise  linearized  equa¬ 


tions.  The  original  method  [39]  suffers  from  a  slow  convergence  problem  when 
tight  coupling  exists  between  the  equations.  Modification  to  decrease  the  com¬ 
putation  time  is  explained  in  the  following  paragraphs. 

The  solution  of  Equation  (2.1 1 ;  based  on  a  Gauss-Seidei  pv/  \VR  algorithm 
[39]  :s  as  1  oilows: 

Step!  0 )  :  Set  x'‘(  i  :  =  .v  ( 0 ),  »  =2 . n  . :  c  [r ,]. 

0  <  ^  <  T. 


Step(  kl  :  Solve  t.  =  j  . , _  v  1  •.  t  )  -  £  (  j  ,  _  .i ' 


?  —  •.  < 


=  r  .  tor  .V  .  1 


vTr  >.r  «[;  -.] 


5* 


.V. 


i-wjt  ->  \>  ->  •>  -j>  'ji-iX ">  'ji ■>  ■jrrjr.X'j'  VKVK  W7>.  VCvnw.*^VKr>T.F.,\J,’vr  vj* wj*  WJ*  -.-  v  w  v  v  .>  v, 


B  § 


p  ?. 

E  * 


Solve  r 4 ( r )  =  ajrrx^(t ) 


*  X,a:*-tx:  ^  )  +  Z  arrnx:  *  +  V;-i  +  >\—r  ^ 


X;it0)  =  .v.  (?0)  ,  for  x'(t  )  ,  t  e  [r0.?  j] 


■ 


Solve  x.  ( t  )  =  ur7I  .r„  ( t ) 


+  f  Z  an:r,  A  ^  +  »_  U  )]. 


.t*(r0)  =  A-n(r0) ,  for.t^(r).  t  €  tr 0.r , ] 


At  each  step,  instead  of  solving  n  coupied  differential  equations,  one  needs  to 
solve  only  n  decoupled  ones.  The  process  is  repeated  until  convergence  is 
obtained. 

Each  of  the  above  equations  is  of  the  form 
x  (t  )  =  j._  a.  ( t  )  -f-  v._.  +  y.  ( t )  .  ?  e  [r  0.r , ] 

The  solution  *o  this  linear  first-order  differential  equation  is 


x  f '  .)  =  x.  (t.Jc  A  r 


J c  ,r  f v._  —  \  (r)L'r 


Il  y.U  !  is  a  constant  the  solution  can  lx-  lound  by  inspection: 


r._  ^0  :  x.{i  )  =  [a, {r q)  +  (w,_  +  c„  ) 


"  rr.  ’  *  0  -  . 

I c  -  lv„  *  c_  )  u 


It  a.„  =  0  :  x.  it  )  =  x,  {l  Q)  +  ( u  ;r,  +  cT .  )(?  —  t  Q) 

In  the  process  of  finding  a, ( t ):[Q,T}  the  values  of  a,m  ,u  .ff  change  due  to  the 
fact  that  the  solution  trajectory  moves  from  one  region  to  the  next.  Hence  r  is 
necessary  to  be  able  to  find  t  when  a  new  region  is  eniered.  which  can  be  done. 

If  ^0  :  r  =  t  Q  +  {In  (vR  j)  ai7_ 

where  vT  =  [*>,+  :aim  ]/[.r  ,(f  0)  + 

If  =  0  :  i  =t 0  +  (b  -  x(l 0))/c* 

The  conditions  for  t  ^  0  are 

1 .  if  ii,„  >0  Then  v_  >0  or  , 

2.  if  a.^  <0  then  0<v_  <  1  or. 

3.  if  at„  =  0  then  (h  —  a  ( r  0 ) )  c*  >0. 

In  general  y.(t)  is  not  a  constant.  However,  if  y,(t)  is  approximated  by  a  "stair¬ 
case"  function,  that  is,  > . ( t )  is  divided  into  intervals  and  in  each  interval  it  is 
represented  by  a  step  input,  then  the  solution  .v.lt)  for  each  interval  can  be 
found  by  inspection  as  derived  above. 

As  an  example,  consider  a  pv/  inverter  circuit  sho^  n  in  Figurt  5.  The 
transistor  model  and  itspv,'  approximation  are  explained  .r.  section  2.2.3.  Solv¬ 
ing  the  node  equation  at  the  output  node  gives 
Q  Si.:  ‘r  I D  1  +  I L  ;  —  1 L  1  ~  I D2  ~  ^ 

C  S  -r  ,*esj  +  :sJ  —  V  -  iJJ  —  \  '.*gsl  -  isl 

—  ■  gW  - 

-  v  ,*zJ!  -  :Jl  =  0 
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The  output  is  low  initially  and  a  falling  step  input  is  applied.  At  t=0  the  input 
falls  to  zero.  Checking  the  p^i  model  gives  isl  as  the  only  nonzero  term.  The 
above  equation  becomes 

C,\'w  =«'  (2.12) 

The  solution  to  this  equation  is: 


ou:  out  inn 

where  is  the  initial  value  of 


isl*t  iC- 


If  t  is  sufficiently  large,  then  at  some  point  in  time  Voul  becomes  so  large 
that  the  linear  model  is  not  valid  anymore.  In  other  words,  a  new  region  in  the 
transistor  /nW  characteristics  is  entered.  When  this  happens,  which  in  our  case 
is  at  Vcut  =  3.25  volts,  the  time  when  the  new  region  is  entered  is  calculated 
using  (2.12)  and  the  /nC  elements  that  are  affected  by  Vou  are  updated.  Now 
the  gdd  and  idd  terms  are  also  nonzero,  and  the  output  node  equation  is  of  the 
form 


I  C.\'  +  C  JR  =  /  .  RJi* 0 

OU t  OU.  ‘ 

!  and  the  solution  is 


where  C.  =  I*R 


=  r,  +  ( r,  —  V,  )*e.xp(—t  l tc  ) 


(2.13) 


V,  =  initial  \ (=3.25) 
tc  =  R*C 

Again  the  model  is  only  valid  until  another  region  is  encountered,  which  in  our 
case  is  at  \\  .  =4.  The  time  when  the  new  region  is  entered  is  calculated  using 


(2.13):  the  pc/  elements  are  updated  and  'he  same  process  continues. 


When  v^.  reaches  5  volts,  the  current  sources  of  the  load  cancel  each 
other  (the  driver  is  still  off)  and  there  is  no  more  change  to  the  output  as  long 


as  the  input  does  not  change. 


Suppose  the  output  of  this  inverter  is  fed  into  another  inverter.  Then  this 
output  is  approximated  by  a  "staircase"  function,  as  shown  in  Figure  6a  (for 
falling  waveform  the  approximation  is  shown  in  Figure  6b)  and  the  same 
analysis  as  above  is  carried  out  for  each  time  interval. 

The  above  solution  of  x.  ( t )  assumes  an  input  time  function  approximated 
by  a  "staircase"  function.  Actually,  the  input  can  be  approximated  by  other 
types  of  functions,  lor  example  a  ramp  function.  However,  the  solution  would 
Then  contain  terms  proportional  to  i  and  cxpi—tx  ),  and  so  there  is  no  simple 
and  fast  way  to  get  the  time  when  a  new  region  or  interval  is  entered. 

As  mentioned  earlier,  in  some  cases  the  method  described  above  converges 
slowly.  This  is  true  for  strongly  coupled  circuits,  such  as  pass  transistor  net¬ 
works,  circuits  with  internal  nodes,  and  circuits  with  floating  capacitors.  The 
reason  is  that  an  approximate  waveform  representation  used  in  the  waveform 
iterative  technique  does  not  give  good  convergence  for  strongly  coupled  nodes. 
In  tne  following  subsections  various  techniques  to  reduce  the  computation 
needed  in  solving  coupled  subcircuits  and  to  ensure  convergence  are  derived. 


2.6.1.  Pass  transistor  networks 


From  Figure  7  it  is  clear  that  the  waveform  at  the  output  node  depends  on 
the  other  nodes.  Applying  the  Gauss-Seidel  pwl  W'R  method  it  is  found  *hat 
more  than  10  iterations  are  required  for  convergence,  which  is  relatively  slow. 


An  approach  which  is  an  extension  of  the  Elmore  [20]  tune  co.nstani  approxima¬ 
tion  for  an  RC  tree  to  the  pwl  case  is  given  here. 

The  Elmore  time  constant  is  related  to  the  impulse  response  of  an  RC  tree. 
An  RC  tree  is  defined  as  an  interconnection  of  resistors  and  capauiors  with  no 
loops.  The  resistors  are  restricted  to  be  between  nonground  nodes  only  while 
the  capacitors  are  only  between  nonground  nodes  and  the  ground.  An  example 
of  an  RC  tree  is  depicted  in  Figure  8.  The  Elmore  time  constant  or  the  delay  is 
defined  to  be 

t, 

Te  is  the  time  delay  of  node  e.  R.e  is  the  sum  of  the  resistances  common  to  the 
path  between  input  and  node  1  and  to  the  path  between  input  and  node  e  and  C, 
is  the  capacitance  of  node  i. 

The  Elmore  time  constant  has  been  used  to  approximate  rising  and  falling 
times  of  an  RC  tree,  such  as  the  one  reported  in  [21].  The  method  in  [21]  does 
not  work  for  the  /no/  approach,  since  it  gives  the  upper  and  lower  bound  of  the 
waveforms.  Also,  the  function  approximation  is  neither  an  exponential  nor  a 
straight  line,  so  it  is  difficult  to  get  the  time  when  the  transistor  crosses  to  a 
new  mo/  interval.  Consider,  for  example,  a  circuit  consisting  of  an  inverter  and 
a  pass  transistor  as  shown  in  Figure  9a.  Note  that  according  to  the  above  equa¬ 
tion  of  the  Elmore  time-constant,  the  time-constant  at  node  1  depends  on 
whether  there  is  a  path  between  node  1  and  node  2.  If  that  is  the  case,  then  the 
time  constant  used  in  solving  for  node  1  is 


i  !.»  U  M.h* 


T ,  =  A’ 


■tCj+cy 


cat  iK'-  er:ef‘ 


Node  2  is  more  difficult  to  analyze.  There  are  three  cases  to  consider.  If  the  pass 
transistor  is  off,  there  is  no  change  to  node  2  ( Figure  10a).  If  there  is  a  resistive 
path  between  node  1  and  node  2  (Figure  10b).  the  time-constant  used  is 


T.  =  R 


■(Cj+CJ  +  R 


due  pass  ;i-onsiKSr 


If  the  pass  transistor  is  in  the  saturation  region  then  the  Elmore  approach  is  not 
applicable  because  the  equivalent  resistor  of  the  pass  transistor  is  infinite  (actu¬ 
ally  the  pass  transistor  is  represented  by  a  controlled  current  source  in  seres 
with  a  resistor,  which  is  in  parallel  with  a  current  source'.  Depending  on 
whether  node  2  is  being  charged  up  or  being  depleted,  one  ol  the  following 
equations  is  used.  In  the  former  case  (Figure  10c). 


C, — -  =  current  supplied  to  node  2 

“  Jr 


C  2  ~^g£Ce  V:^  Rpass  +  ‘'pcss 

dr  * 

The  solution  to  this  equation  is  either  a  straight  line  or  an  exponential.  If  node 
2  :s  being  depleted  (Figure  lOd), 


C, - -  current  out  of  node  2 

Jr 


'  1  ^  ^pcij  ‘pj.\s 


v,  is  held  constani  at  its  initial  value  until  a  new  region  is  entered.  The  solu¬ 
tion  to  this  approximation  is  a  straight  line.  The  simulation  result  in  Figure  °b 
of  ‘he  circuit  in  Figure  9a  indica’es  that  the  me/  resuit  is  reasonably  close  to  the 
SPK  E  result. 
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a  Pass  transistor  is  off 


b  Pass  transistor  is  in  the  resistive  region 
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c  Pass  transistor  is  in  saturation  with  node  2  being  charged  up 


d  Pass  transistor  is  m  saturation  with  node  2  being  depleted 


Fig.  10  Piecewise  linear  model  of  a  pass  transistor 
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2.6.2.  Circuits  with  internal  nodes 

The  equations  of  a  circuit  with  interna!  nodes  (e.g..  N'and  gates)  art  o!  the 
form  (assuming  all  capactances  are  connected  to  the  ground)  C j x ,  =  fij  jX|  + 

1 3xi2 


a  1  n—  1  x;n  ~  w  1 


-  a2n-lxin  +  w2 


C.rx.„  =  a  ,x,  *  a  ~x;1  -  a  ...  +  a„„  ,x.  -  \v 

•r-  nl  1  ni  i  1  ni  ia  nn-1  m  n 

where  x^  ( j=  1 .2 . n)  is  an  internal  node  variable  and  w.  is  a  constant. 

One  way  of  simplifying  the  above  equations  is  to  "lump"  all  the  capacitances  at 
the  output  and  neglect  all  internal  capacitances.  In  this  case  the  equation 
becomes 

,Ci  *  £Cv  u  i  =  5 1  j xi  a 1 2 i  1  +  a13x:2  *  *  alr.+  lxin  ~  W1 

0  =  a2]x}  -  a22Ni  1  *  a23xi2  _  “  a2.n-lxin  *  u'2 


0  =  a„  ,  x,  -  a  -.x  ,  -  a  -  ...  -  a  .  x.  -  w 
nl  i  r.2  ll  no  i2  nn-1  in  n 

Simulation  oi  a  r.ar.d  gate  is  shown  in  Figure  1  1.  From  the  simulation  results 
we  found  that  if  the  internal  node  canat dance  :s  less  then  one  tenth  of  the  out- 
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put  capacitance  our  approximation  gives  reasonable  results.  If  the  internal  node 
capacitances  are  too  large  then  the  direct  method  is  employed. 

2.6.3.  Circuits  with  floating  capacitors 

For  simplicity,  we  consider  a  two-dimensional  problem  of  the  form 

C  JJ.V,  +c!2-v2  =  aiimxl  ■"  wlm  *  ylft) 

C  12-v  i  +  C  22*  2  =  a2 1  mx !  ”  w2m  ~  y2^  *  ^ 

The  Elmore  time-cor.stant  approximation  is  not  applicable  in  this  case  since 
there  is  a  capacitor  between  two  nonground  nodes,  and  in  some  cases  resistors 
may  be  connected  from  the  nodes  to  the  ground.  In  this  case,  when  applying  the 
Gauss-Seidel  method  to  this  problem,  the  "staircase”  approximation  of  the 
waveform  does  not  work  since  the  time  derivative  at  the  breakpoints  is  infinite. 
Therefore,  a  ramp  approximation  is  made  for  falling  and  rising  waveforms.  The 
derivative  of  a  ramp  gives  a  constant  function  which  is  suitable  for  the 
approach  described  above.  For  a  test  circuit  a  bootstrap  circuit  as  shown  in  Fig¬ 
ure  12a  ;s  used.  The  waveforms  are  shown  in  Figure  12b.  The  ramp  approxima¬ 
tion  is  chosen  as  dv/dt  at  the  midpoint  of  the  rising  or  falling  input.  Four 
iterations  are  needed  to  get  convergence. 

The  above  approximation  methods  have  their  drawbacks.  The  method  in 
which  the  Elmore  time-constant  approach  is  utilized  gives  good  results  when 
the  pass  transistor  network  is  small.  When  the  network  consists  of  more  than 
three  pass  transistors,  the  error  of  the  pass  transistor  voltages  becomes  large. 
Therefore,  for  pass  transistor  networks  the  conventional  approach,  where  the 
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lime  derivative  is  discretized  using  the  Backward  Euler  Method  and  where 
analysis  at  the  linear  level  is  performed,  is  utilized.  Similarly,  when  the  inter¬ 
nal  capacitances  become  large  compared  to  the  output  capacitance,  the  approxi¬ 
mation  method  for  nand-gate  type  circuits  as  described  above  becomes  less 
accurate.  To  obtain  more  accuracy  the  subcircuit  neecs  to  be  solved  in  the  same 
way  as  it  is  done  by  standard  circuit  simulation;  that  is.  the  time  derivative  is 
discretized  using  Backward  Euler  Formula,  and  then  the  equations  are  solved  at 
the  linear  level. 

The  method  of  the  Gauss-Seidel  waveform  relaxation  ynW  is  fast  for 


analyzing  simple  gates  such  as  inverter,  nor,  nand  gates.  The  computation 
efficiency  is  due  to  the  fact  that  there  is  no  need  to  calculate  the  voltages  at  each 
time  point.  As  long  as  the  transistors  remain  in  the  same  regions,  the  solution  of 


the  equation  is  either  a  straight  line  or  an  exponential.  Another  advantage  is  due 
to  the  fact  that  the  solution  is  obtained  using  the  waveform  relaxation 
approach,  which  solves  the  equations  at  the  differential  equation  level,  and 
hence,  there  :s  no  need  of  transforming  the  differential  equation  into  the  linear 
level.  The  drawback  is  that  the  method  works  well  only  for  simple  circuits 
such  as  the  inverter,  nor  and  nand  gates,  while  for  other  types  of  circuits  the 
methods  can  be  very  slow.  In  summary,  the  hrst  method,  the  pv/  analysis  on 
sirnplices.  which  has  good  convergence  and  gives  accurate  waveforms,  is  very 
slow.  The  second  pwl  method,  which  solves  the  pwi  circuit  by  inspection,  is 
fast  but  limited  in  the  type  of  circuits  that  can  be  accurately  solved.  Realizing 
the  drawbacks  of  the  methods  described  above,  a  pv/  method  wh.ch  is  cn.te 


fast  but  accurate  is  desirable.  To  obtain  results  as  accurate  as  those  from  stan- 


CHAPTER  3 


DYNAMIC  PARTITIONING  APPOACH 
FOR  PIECEWISE  LINEAR  CIRCUITS 


3.1.  Introduction 

The  two  pM-l  methods  described  in  the  previous  chapter,  namely  the  fast 
pW  and  pv/  on  simplices  methods,  require  that  the  strongly  connected  com¬ 
ponents  in  a  circuit  be  solved  either  as  a  whole  or  using  a  relaxation  method. 
Solving  the  strongly  connected  components  as  a  whole  might  be  too  expensive 
because  the  blocks  could  be  large.  The  relaxation  method  is  preferable.  How¬ 
ever,  where  to  break  the  loops  of  see  to  start  the  relaxation  method  so  that  the 
number  of  iterations  of  the  relaxation  method  is  minimized  is  not  known.  The 
method  followed  is  to  cut  the  loop  randomly  and  assign  the  corresponding  node 
voltages  to  the  previous  values  and  start  the  relaxation  process.  Note  that 
where  the  loop  is  cut  is  fixed  throughout  the  simulation. 

In  this  chapter  we  will  describe  a  novel  way  of  breaking  the  strongly  con¬ 
nected  components  dynamically  and  naturally,  so  that  the  smaller  partitioned 
subcircuits  are  manageable  for  analysis.  Review  of  other  methods  are  men- 


*ior.ed  first. 
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3.2.  Dynamic  Partitioning 

There  has  been  an  interest  in  partitioning  large  circuits  into  loosely  cou¬ 
pled  subcircuits.  Specifically,  in  [33]  the  partition  of  MOS  circuits  is  obtained  by 
calculating  the  equivalent  conductances  and  capacitances  of  two  adjacent  nodes. 
If  the  calculated  values  exceed  some  predetermined  values  then  the  two  nodes 
are  grouped  together.  This  partition  is  done  only  once  at  the  beginning  of  the 
simulation  of  the  MOS  circuits.  A  similar  approach  for  bipolar  circuits  is 
described  in  [34],  except  here  the  partition  is  performed  dynamically.  The 
calculation/partition  is  not  done  at  each  iteration,  since  this  would  be  too 
costly.  Only  when  an  iteration  threshold  is  exceeded  is  a  repartitioning  per¬ 
formed.  At  this  point  it  is  expected  that  the  speed  up  in  computation  is  dom¬ 
inant  over  the  repartitioning.  Recently,  a  partitioning  based  on  checking  the 
coupling  terms  of  the  following  nodal  equation  is  proposed  [18]. 

JVr  JV. 

Q  — c,  r  -  £  Cn.  —  L  GKJ  \ -.  (3.1) 

J'r  €  J  d:  :  e  J  -I 

where  is  the  voltage  at  node  n.  V.  is  the  voltage  at  node  j  ,  J  is  the  set  of 

nodes  connected  to  node  n.  C,  is  the  sum  of  capacitances  connected  to  node  n  , 

C. .  is  the  sum  of  the  capacitances  connected  between  nodes  r.  and  j.  G_  is  the 

sum  of  conductances  connected  to  node  n  and  Gr.  is  the  conductance  between 

node  n  and  node  j.  7 is  the  current  source  connected  to  node  n.  If  the  coupling 

JY. 

terms  £C„. — -and  £G,.l\  are  negligible  compared  to  the  right-hand 
.■  € 1  Jl  ;  (  J 

side.  *h.en  the  coupling  between  node  n  and  node  j  e  J  :s  negligible,  ana  there¬ 
fore  the  partition  is  performed  between  nodes  n  and  ;. 
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In  our  case  the  dynamic  partitioning  is  applied  to  circuits  consisting  ol 
transistors  that  have  been  piecewise  linearized.  The  transistor  model  used  is 
the  Meyer's  model  [25].  The  model  is  piecewise  linearized;  at  each  p-wl  region  a 
particular  type  of  transistor  (load,  driver  or  pass  transistor)  is  represented  by  a 
conductance  and  a  current  source.  These  conductances  and  current  sources 
values  are  stored  in  a  table  so  that  during  transient  analysis  a  table-lookup 
method  can  be  employed.  More  details  of  the  model  are  presented  in  section 
2.2.3  and  Appendix  A.  The  partition  relies  on  the  comparison  of  integers  indi¬ 
cating  regions  of  piecewise  linearized  transistor  operations,  and  it  is  done  at  each 
iteration  of  the  Gauss-Jacobi  or  Gauss-Seidel  method. 


3.3.  Piecewise  Linear  Dynamic  Partitioning 

Let  the  system  of  p\d  algebraic-differential  equations  describing  the  circuit 
be  written  as 


C.x(r )  =  .4,  x  (r )  +  b.  4-  y  (t  )  .  i  — 1.2 . r  ( 3.2) 

where  C.  e  R  '*  is  the  matrix  representing  piecewise  linearized  capacitors  in  the 

circuit.  A.  and  b.  represent  piecewise  linearized  transistors  and  y  (r )  the  input 
waveforms.  The  subscript  denotes  a  particular  region  of  the  piecewise  linear¬ 
ized  elements. 

Applying  an  implicit  integration,  formula  to  (3.2),  we  get 


C. - =  A.  x  . r  b.  +  yU-.j ) 

h 

The  Newton-Raphson  iterative  scheme  is  used  to  solve  the  nonlinear  algebraic 


equations.  Then  at  each  time  stem  one  solves 


k'JU'.li'lh1; 


;c.  -  hA.  )  X 


h  ( b.  +  y  ))  +  C  x 


c.  c. 

(— -.4.)x..,=b  +  y  (r,_j)  +  —  x  .  =  s. 


Consider  hrst  the  case  where  C.  is  diagonal  (no  floating  capacitors)  with  capaci¬ 


tors  from  nodes  to  ground.  Equation  ( 3.3)  can  then  be  written  as 


[A  ]x  «S. 

where  the  off-diagonal  elements  of  A.  are  created  by  the  resistive  part  of  the 


circuit.  The  aim  of  our  dynamic  partitioning  approach  is  to  order  the  circuit 


variables  so  that  the  matrix  .4;  is  block  diagonal,  with  each  diagonal  block  being 


as  lower  triangular  as  possible.  At  every  iteration  point  the  values  of  the  off- 


diagonal  elements  of  A.,  and  consequently  the  structure  of  A . ,  are  determined 


by  the  local  and  global  connectivity  of  the  nodes  in  the  circuit. 


The  local  connectivity  of  the  nodes  is  then  determined  by  the  slopes  of  the 


.haractenstics  of  the  resistive  elements  at  the  iteration  point.  In  the  /tv/  case. 


these  values  depend  on  the  region  combination  of  the  characteristics  equations. 


which  in  our  case  are  the  MOS  transistor  characteristics.  Let  us  consider  the 


transistor  as  a  three-terminal  device,  as  shown  in  Figure  4.  The  contribution  of 


a  given  transistor  to  the  circuit  matrix  is  as  follows  : 


Ss  ?s~?d 


So  ~e5  -ffs+£?.n 


(3.4) 


0  0  0 


Note  that  since  the  gate-to-drair.  and  the  gate-to- source  capacitances  are 


ignored,  the  contribution  to  the  row  corresponding  *o  the  gate  G  is  zero.  Since 


"here  are  three  regions  in  each  of  the  two  two-terminal  pw  branches  m  tm 


rar.sis'or  model  as  described  in  Chapter  2  above,  there  are  rune  possible  region 
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combinations  in  which  ihe  transistor  operates.  The  values  of  ec,  ss  and  sD—  g- 
in  each  of  the  regions  are  listed  below. 


Table  1  :  Transistor  connectivity 


REGION 


SdSs 


connectivity 
Figure  13a 
Figure  13b 


(3.3) 

(2.3) 


gp,  0 

gn  0 


Figure  13c 


Figure  13d 


gs,  SD-gs^  Figure  13e 
gs,  SprSs,  " 


where  the  value  of  gn  is  equal  to  e<-  and  the  value  of  e,~,  is  eoual  to  g,  .  Note 

-^2  -  .> j  -u  j  •  -yi 

that  there  are  only  two  regions,  namely,  regions  (2.3)  and  (3.2).  in  which  the 
entries  in  rows  D  and  S  in  (3.4)  are  all  nonzero.  The  local  transistor  connec¬ 
tivity  is  then  determined  by  checking  the  region  as  shown  in  Table  I  and  Figure 
13.  Note  that  if  the  drain  or  the  source  is  connected  to  the  ground,  only  one 
row  in  (3.4)  needs  to  be  considered.  Consequently,  'he  connectivity  of  the  cir¬ 
cuit  depends  on  the  operating  regions  of  the  transistors.  Hence,  the  structure  or 
the  zero-nonzero  pattern  of  A.  car.  be  determined  from  the  transistor  regions 
without  any  computation.  This  fact  leads  to  the  feasibility  of  performing 
efficient  dynamic  partitioning.  The  local  connectivity  m  turn  affects  'he  global 


connectivity:  that  is.  the  local  interconnection  oi  dram,  gate  and  source  defines 
the  overall  interconnection  of  nodes  in  the  circuit.  The  global  connectivity  of 
the  nodes  is  then  determined  by  applying  a  depth-first -search  technique  [14], 

Because  of  the  nature  of  digital  MOS  circuits,  the  above  partitioning  pro¬ 
duces  a  block-diagonal  circuit  matrix  with  most  of  the  blocks  in  lower  triangu¬ 
lar  form,  even  for  sequential  circuits.  The  partitioning,  of  course,  vanes  with 
the  iteration  points.  We  assume  that  there  is  a  capacitance  from  every  node  to 
ground;  therefore,  for  finite  time  step  h  the  diagonal  blocks  are  nonsingular. 
Thus  at  each  iteration,  the  linear  system  in  (3.4)  in  most  cases  is  solved  in  one 
sweep  using  forward  substitution,  with  the  possibility  of  the  diagonal  blocks 
being  solved  in  parallel. 

When  floating  capacitors  are  allowed  in  the  circuit,  then  the  matrix  cannot 
be  in  Block  Diagonal  or  Block  Lower  Triangular  form  anymore:  hence,  one- 
sweep  iteration  is  impossible.  A  typical  matrix  at  one  iteration  when  floating 
capacitors  exist  in  the  circuit  is  shown  in  Figure  14.  In  this  case  a  combination 
of  dynamic  partition  and  Gauss-Seidel  type  of  iteration  is  employed.  The 
dynamic  partition  is  applied  to  the  transistors  in  the  circuit,  assuming  the 
small-valued  floating  capacitors,  such  as  the  gate-source  and  the  gate-drain 
capacitors,  do  not  exist.  A  large-valued  floating-capacitor  such  as  a  bootstrap 
capacitor  is  assumed  to  establish  a  connection  between  the  nodes  where  it  is 
connected.  From  experiments  on  some  circuits  the  threshold  value  is  the  sum 
of  the  grounded  capacitors  which  the  floating  capacitor  is  connected  to.  The 
floating  capacitors  that  are  not  included  in  the  partitioning  will  create  teed  back 
and  feedforward  terms  within  and  between  the  diasonal  blocks  created  by  the 
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partitioning.  In  this  rase.  Gauss-Seidel  iteration  is  used  in  solving  ( 3.4 J. 

When  a  full-blown  nonlinear  transistor  model  is  used,  the  method  of 
checking  the  local  connectivity  needs  to  be  generalized.  Instead  of  region  com¬ 
parison,  voltage  comparison  is  performed  on  each  transistor.  For  example,  from 
Table  1.  in  /nW  case  one  concludes  that  the  gate  is  independent  of  the  source- 
dram  part  when  the  gate-drain  region  is  equal  to  the  gate-source  region.  In  the 
general  case  the  gate  is  independent  from  the  source-drain  part  when  the 
difference  between  the  gate-source  voltage  and  the  gate-drain  voltage  is  within 
some  tolerance  A  V.  Physically,  it  means  that  the  source-drain  current  is 
independent  of  the  gate  voltage  when  the  gate-source  voltage  is  close  to  the 
gate-drain  voltage. 

As  an  example  consider  a  simple  5-stage-ring-oscillator  shown  in  Figure 
15.  A  worst-case  partitioning  approach  would  treat  all  nodes  as  one  block.  In 


our  case  this  one  block  is  partitioned  into  smaller  subblocks.  A  Newton- 
Raphson.  Gauss-Seidel  method  is  used  to  solve  the  circuit.  Let  us  consider  a 
piecewise  linearized  driver  transistor  with  breakpoints  0.  1.5.  2.75  and  5  (  Fig¬ 
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ure  3  )  and  similarly  a  load  transistor  with  breakpoints  -5.  -1.75.  -1  and  0. 
Assume  that  a  falling  step  input  is  applied  and  dc  values  for  the  nodes  have 
been  calculated  [  (node. voltage)  :  (2.0),  (3.5).  (4.0).  '5.5),  (b.0>  ].  By  checking 


the  table  of  the  driver  and  comparing  the  regions  of  the  transistor  operation, 
one  concludes  that  at  this  initial  state  all  the  nodes  are  decoupled  from  one 
another.  At  ether  iterations  the  partition  changes.  For  instance,  at  the  9 
nanoseconds  the  voltages  of  the  nodes  are  I  (node. voltage)  :  (2.2.885  !.  >3.2.815). 
•  4.0.15).  (5.5).  1 6.0.15)  ].  Checking  the  driver  table  one  obtains  the  regions  oi 


*_ 


the  driver  ol  the  first  inverter  to  be  (  1,1 ).  The  regions  of  the  driver  oi  the 
second  inverter  are  (2.3).  This  indicates  that  in  the  matrix  the  dram  of  this 
driver  depends  on  its  gate.  Similarly,  the  regions  of  the  driver  of  the  third 
inverter  are  (3.2)  and  hence  the  drain  node  depends  on  the  gate  node.  As  a 
result,  the  nodes  2,  3  and  4  are  in  one  subblock.  By  applying  the  same  pro¬ 
cedure  one  finds  that  node  5  and  node  b  are  in  two  separate  subblocks.  Figure 
lb  shows  the  partitions  at  three  instances.  Figure  lba  shows  that  the  nodes  2.3 
and  4  are  in  one  block  which  is  lower  triangular:  node  5  is  in  one  block  and 
node  6  is  in  another.  All  blocks  are  completely  decoupled.  Solving  the  matrix 
using  the  Gauss-Seidel  iterative  method  dynamic  partitioning  and  worst-case 
partitioning  have  the  same  effect,  in  that  the  voltages  are  obtained  ir.  one  sweep 
of  calculation.  Figure  16b  shows  another  partition  using  the  dynamic  partition¬ 
ing  method  at  a  different  instance  while  Figure  lbc  shows  a  partition  using  the 
worst-case  one  at  the  same  instance  as  Figure  16b.  Using  the  dynamic  partition¬ 
ing  method  (Figure  16b)  the  solution  is  obtained  in  one  sweep.  Nodes  6  and  2 
are  solved  together  as  one  block  while  the  rest  of  the  nodes  are  in  separate 
blocks.  On  the  other  hands,  using  the  worst-case  partitioning  approach  (Figure 
lbc)  would  require  more  than  one  iteration  due  to  the  existence  of  the  upper- 
diagonal  element.  Figures  17a  and  17b  show  the  corresponding  graph  represen¬ 
tation  of  Figures  16a  and  lbo.  respectively. 

To  reduce  the  computation  time  even  further,  the  nonactive  partitioned 
subcircuits  could  be  identified.  The  nonactive  ( latent  subcircuits  do  not  need 
analysis.  The  active  subcircuits  consist  of  transistors  that  do  not  change  regions 
but  their  terminal  noces  are  active.  An  active  node  of  a  circuit  is  the  one  that 
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(node. initial  voltage)  (2.2.885)  (3.2.815)  (4.0.15)  (5.5.0)  (6.0.15) 
time  0.9000d-08  nodes  (2. 3. 4 .5 .6) 


0  130d-03 
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0  250d— 03 

0.130d— 03 
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0.150d— 03 
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0. 160d— 03 

0 

0 

0 
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0 

0.380d— 03 

node  voltages  0.2S85d-01  0.2626d+01  0.3287d+00  0.5000d+01  0.l500d+00 


Fig.  16a  Matrix  of  the  ring  oscillator  at  9ns 


(node. initial  voltage)  (2.4.867)  (3.0.15)  (4.4. 125)  (5.0.1684)  (6.1.663) 
time  0.2300d-07  nodes  (6.2.3 .4.5) 
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0.380d— 03 

node  voltages  0.1663d-01  0.4765d+0l  0.l500d*00  0.4109d+01  0.1684d*00 


Fig.  16b  Matrix  of  the  ring  oscillator  at  27  ns  (dynamically  partitioned) 
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Fig.  I  be  Matrix  of  the  ring  oscillator  at  27  ns  (worst-case  partitioned) 


& 


§ 


*> 


V 

V* 


A' 


08 

violates  at  least  one  of  the  following  [2]: 

( 1  )  r_(rr_  )  -  <  e2  +  c  max  ( 

rn=l,2.... 

where  ca  and  e,  are  the  absolute  and  relative  error  tolerances  for  vol¬ 
tages. 

(2)  I^ir,  )  -  C?_ ^  ^  ef  +  e,  max  (  l_(tn  )Jtr(tn_l) 

in=l,2.... 

where  ec  is  the  absolute  error  tolerance  for  current. 

C3)  /ir_! -  >  1 

Qw(rn)-Q„(rn_j) 

in  =  l,2.... 

where  hn_1  is  the  time  step  taken  by  the  program  at  rr_1  and  Q ^  is  the 
charges  of  the  capacitor  at  node  m. 

In  timing  analysis  only  the  first  and  second  rules  are  checked. 

As  an  example  let  us  consider  the  ring  oscillator  example.  The  following 
table  shows  how  partitions  change  during  the  solution  process.  The  numbers  in 
the  parenthesis  show  the  node  numbers  that  are  in  the  same  block,  for  example 
(2.3)  indicates  node  2  and  node  3  art  in  the  same  subcircuit. 

There  are  two  important  numerical  processes  that  can  be  deduced  from  the 
table. 

1 .  Repartitioning. 

From  tune  0  to  5  ns  the  partition  stays  the  same.  At  5  ns.  the  partitioning 
changes,  and  stays  the  same  until  9  ns  when  the  partitions  change  again. 
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Table  II  :  Partitions  ol  the  ring  oscillator  circuit 


Time 

Iteration 

Partition 

Ons 

0 

(2). (3). (4) 

.  (5) 

(6) 

1 

(2) . (3) . (4) 

.  (5) 

(b) 

5ns 

0 

(21 . (3) . (4 J 

.  (5) 

(b) 

1 

(2.3)  ,  (4) 

,  (5)  , 

(b) 

2 

(2.3).  (4) 

.(5). 

(b) 

9ns 

0 

(2,3)  .(4) 

.  (5)  , 

(b) 

1 

(2.3.4) 

,  (5)  . 

(b) 

2 

(2.3)  .(4) 

,  (5)  , 

(b) 

3 

(2.3)  .(4) 

.  (5)  , 

(6) 

Only  those  transistors  that  change  regions  are  included  in  the  repartition- 
mg  process. 

Analysis. 

All  subcircuits  that  go  through  repartitioning  must  be  analyzed.  The 
subcircuits  that  are  not  repartitioned  but  whose  node  voltages  change 
considerably  must  also  be  solved.  The  rest  of  the  circuit  that  is  not 
repartitioned  and  is  latent  need  not  be  solved.  For  example,  in  the  ring 
oscillator  above,  from  time  0  to  5  ns.  although  the  partitioning  stays 
unchanged,  nodes  2  and  3  start  oscillating  while  nodes  4.5  and  b  remain 
latent.  This  means  that  only  voltages  of  nodes  2  and  3  need  to  be  solved. 
The  ring  oscillator,  which  is  analog  in  nature,  represents  an  extreme  case: 
when  time  advances,  ail  node  voltages  change.  Digital  circuits  typically 
exhibit  a  greater  degree  of  latency. 

The  above  dynamic  partitioning  approach  is  performed  on  top  of  worst- 
>e  partitioning,  which  is  performed  once  at  the  preprocessing  step.  The 


worst-case  partitioning  is  necessary  to  determine  which  parts  of  a  circuit  are 
large  enough  to  require  dynamic  partitioning.  Worst-case  partitioning,  which  is 
also  known  as  partitioning  into  dc-connected  subcircuits,  is  based  on  worst -case 
transistor  local  connectivity  as  shown  in  Figure  lie. 
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CHAPTER  4 


PARALLEL-VECTOR  IMPLEMENTATION  OF 
PIECEWISE  LINEAR  DYNAMIC  PARTITIONING  METHOD 


In  the  last  feu-  years  there  has  been  a  growing  interest  in  developing  CAD 
tools  to  run  on  parallel  and  on  vector  computers.  The  idea  of  parallel  computa¬ 
tion  is  that  using  N  processors  a  program  should  run  N  times  faster  than  if  only 
one  processor  is  used.  In  reality,  the  computing  speed  up  is  often  smaller  than 
the  theoretical  one.  The  idea  of  vector  or  pipeline  computers  is  that  by  dividing 
a  task  into  subtasks  and  by  maintaining  a  Sow  of  operand  pairs  in  the  analysis 
process  the  speed  up  can  be  increased. 

To  utilize  the  maximum  capability  of  a  vector  and/or  parallel  computer, 
one  needs  to  use  the  appropriate  languages  and  algorithms.  From  the  user’s 
point  of  view,  very  little  can  be  done  about  the  language  since  usually  it  is 
given  by  the  manufacturer  who  already  tailors  the  language  to  the  specific 
architec’ures  of  the  machine.  Given  a  particular  machine  architecture,  one  needs 
to  design  algorithms  that  can  provide  the  best  possible  results. 

The  dynamic  partitioning  method  described  in  Chapter  3  is  well  suited  for 
implementation  or.  a  parallel  machine  with  shared  memory.  The  reason  is  that 
during  the  iterations  exchanges  of  vertices  and  nodes  among  the  blocks  in  the 
graph  representing  the  circuit  occur.  In  other  words,  there  are  exchanges  of 
transistors  among  the  partitioned  subcircuits.  Implementation  of  the  method  on 
a  parallel  machine  with  local  memory  would  have  a  high  cost  of 
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intercommunication  among  tne  processors.  The  machine  used  in  this  study  is 
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Ailiant  FX/8.  It  a  parallel-vector  machine  with  8  processors  and  a  shared 


memory  and.  therefore,  is  suitable  for  implementation  of  the  dynamic  parti¬ 
tioning  method. 
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The  main  iteration  loop  of  the  dynamic  partitioning  method  consists  of 
determining  local  connectivity,  determining  global  connectivity  and  solving 
each  partitioned  subcircuit  block.  Figure  18  depicting  the  steps  is  shown  on  the 
next  page.  The  box  enclosed  by  broken  lines  will  be  explained  later.  This  box 
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later  is  modified  to  make  the  approach  more  efficient.  Each  process  can  be  done 
in  parallel  as  described  in  the  following  paragraphs.  A  general  approach  applied 
to  each  of  the  processes  is  described  first. 
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In  general,  complete  parallel  vectorization  is  not  feasible.  Since  vc-ctoriza- 
tion  of  a  loop  prohibits  subroutine  calls,  only  parallelization  is  useful  for  most 
cases.  The  parallelization  on  the  Ailiant  is  performed  by  setting  up  a  do  loop, 
with  a  directive  for  concurrency.  A  typical  format  is  as  follow  : 
cvd  concur 
do  1  i=l.n 

call  routine 

1  continue 

The  loop  contains  a  call  to  a  routine  that  does  a  task.  The  concurrency  is 
automatic;  that  is.  no  particular  assignment  of  processors  is  necessary.  Each  of 
the  available  processors  perforins  a  subroutine  call.  When  one  processor  finishes 
a  ;ob.  it  would  automatically  perform  another  call  until  all  the  n  number  of 
ca.is  are  completed.  Each  routine  inside  the  concurrent  do  loop,  in  general,  con- 
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Start 


Input  circuit  topology, 
circuit  parameter  values, 
initial  voltage  values 


Determine  companion 
models  of  nonlinear 
elements 


Local  connectivity  checks 
(region  checks) 


Global  connectivity 
CVishkin  algorithm) 


Solve 


Determine  new  companion 
models  of  nonlinear 
elements 


Local  connectivity  checks 
(region  checks) 


New  regions  =  Old  regions  ? 


t  =  t-'n 


Global  connectivitv  ! 

( Vishkin  algorithm;  j 


Fig.  IB  Flowchart  I  :  dynamic  partitioning  algorithm 
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Tains  a  set  of  common  blocks  of  global  variables  and  a  set  of  local  variables. 
W  hen  concurrency  is  invoked,  a  stack  is  created  so  that  each  local  variable  has 
n  different  copies  where  n  is  the  number  of  processors  (at  present  n=8  on  the 
Alliant).  Each  of  the  global  variables  is  potentially  accessible  by  many  proces¬ 
sors  at  the  same  time.  This  is  not  desirable  because  incorrect  values  would  be 
stored.  A  lock  is  applied  during  the  execution  of  the  code  that  updates  the  glo¬ 
bal  variable  to  prevent  the  concurrent  execution  by  multiple  processes.  A  spe¬ 
cial  feature  that  the  lock  must  have  is  that  one  instruction  must  check  if  a 
variable  is  free  and  if  it  is  to  set  (  or  lock  )  the  variable.  This  is  important  since 
if  the  setting  is  not  done  instantly,  another  processor  might  consider  the  vari- 
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able  to  be  f ree  and  attempt  to  set  it. 

The  determination  of  local  connectivity,  global  connectivity  and  solution 
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of  the  variables  follow  the  pattern  described  in  the  above  paragraph. 
For  local  connectivity  the  loop  is 
cvd  concur 


do  2  i=l. number  of  transistors 


call  rnostbl 


2  continue 

The  input  parameters  to  rnostbl  are  the  voltages  of  source,  gate  and  drain,  and 
the  outputs  are  the  regions  of  gate-drain  and  gate-source  and  the  associated  con¬ 
ductances  and  current  sources.  In  this  routine  the  connectivity  of  each  "ransis- 
tor  is  determined  and  the  corresponding  edges  between  the  source,  dram  and 
gaA  nodes  are  created  !  if  applicable  ).  These  edges  are  need rd  for  global  con¬ 
nectivity  determination. 
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The  global  connectivity  is  determined  after  transistor  connectivity  is  com¬ 


pleted.  Before  explaining  the  implementation  details,  parallel  algorithms  to 


determine  the  connectivity  of  a  graph  problem  are  described  next.  Hirschberg 


etai.  proposed  a  method  that  solved  the  connected  component  problem  of  an 


undirected  graph  in  OClog'n  )  time  using  n  ~  I  log  n  processors  [52],  where  n  is 


the  number  of  nodes  in  the  graph.  A  variation  of  the  method  which  requires  an 


even  smaller  number  of  processors  of  max  (n  ,e )  is  given  in  [53],  where  e  is  the 


number  of  edges  in  the  graph.  Another  algorithm  that  determines  the  con¬ 


nected  component  of  an  undirected  graph  and  uses  an  approach  that  is  different 


from  the  ones  above  is  proposed  by  Shiloac  and  Vishkin  [54].  This  algorithm 


determines  the  connected  components  in  Oflog  n).  but  it  requires  2e+n  proces¬ 


sors.  Since  the  number  of  processors  in  a  parallel  computer  is  bound  to  increase 


in  the  future,  this  algorithm  which  requires  more  processors  but  determines  the 


connected  components  in  shorter  time  is  chosen  for  our  work.  Another  advan¬ 


tage  is  that  the  amount  of  temporary  working  memory  in  this  case  -  Oflog  n)  - 


is  much  smaller  than  for  the  one  proposed  in  [52],  which  is  of  Of/;*).  Such  a 


memory  space  requirement  can  be  prohibitive  when  the  size  of  the  circuit  is 


large. 


The  input  to  the  algorithm  of  Shiloac  ana  Vishkin  consists  of 


-  ‘he  vertices  represented  by  the  numbers  1 . n 


'he  edges  specihed  by  a  vector  e  of  length  2e  in  which  edge  (i.j)  appears  as  a 


oi  directed  edges  <i.j>  and  < j.i > , 


itput  is  a  vector  D  [l:n]  where  D  [i]  points  to  the  root  node  to  which 


>  connected. 
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A  Temporary  memory  Q  of  length  n  is  needed.  The  two  main  opt  rations  of 
the  algorithm  are 

(a) .  Shortcutting  :  decreasing  the  height  of  a  tree 

(b) .  Hooking  :  reducing  the  number  of  trees. 

An  informal  description  of  the  algorithm  is  given  first,  followed  by  the 
more  formal  one.  The  notation  Ds  ( i )  =  j  means  that  vertex  i  points  to  vertex  j 
after  the  sl  ‘  iteration.  Initially,  each  vertex  points  to  itself,  that  is,  D0(i)  =  i 
for  i=l....,n. 


Informal  description  of  the  algorithm  : 

Step  1  :  First  shortcutting  DSU) «—  Z)s _ j ( _ j C f )): 

If,  in  s  —  \'h  iteration,  node  i  points  to  some  node  j  and  node  j  points 
to  another  root  k.  then  after  the  s'k  iteration,  point  node  i  to  node  k 
(shortcutting). 

Step  2  :  Hooking  trees  onto  smaller  vertices  of  other  trees.  For  all  vertices 

that  point  to  a  root  at  the  end  of  the  previous  iteration  check  if  their 
neighboring  vertices  point  to  smaller  vertices.  If  such  a  neighboring 
vertex  j  exists  for  a  particular  vertex  i.  then  hook  the  tree  to  which  i 
belongs  onto  Ds  ( j  ). 

Definition  D:  A  tree  is  stagnant  in  the  s  '  iteration  if  it  has  not  been  changed  in 
the  first  two  steps  of  this  iteration;  that  is.  it  has  not  been  subjected  to  any 
shortcut  operation,  no  tree  has  been  hooked  onto  r,  and  it  has  not  beer,  hooked 
on’o  any  other  tree.  A  root  of  a  stagnant  tree  is  a  stagnant  r«vo. 


g 


Sfep  3  ;  Hooking  stagnant  trees: 
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For  aii  processors  of  vertices  Thai  point  to  a  smgnan'  root,  check  il 
their  neighbors  point  to  a  vertex  of  another  tree.  !f  such  a  vertex  j  is 
found,  hook  its  tree  onto  Ds(j  . 

Step  4  :  Second  shortcutting  Dsi:  )  «—  Z)s_ j( ,( i  )): 

Same  as  step  1 

A  graphical  procedure  is  given  next. 

Initially,  each  node  points  to  itself. 


After  the  hooking  operation  (  on  the  edges  ).  the  nodes  start  to  point  to  their 
neighbors  with  smaller  node  numbers. 


Then  after  the  shortcutting  operation  (  on  the  nodes  ),  the  nodes  point  to  other 
nodes  further  down. 

The  hooking  and  shortcutting  operations  are  repealed  until  finally  all  the  nodes 
point  to  the  root. 

A  more  complete  description  of  the  algorithm  and  the  necessary  arrays 
used  arc  giver,  next.  The  algorithm  contains  the  vector  Q  which  satisfies  : 

Q(t  =  s  if  after  the  second  step  of  the  s  iteration  ’here  exists  at  least  one 

ver’ex  ;  pointing  to  i  that  does  not  point  to  i  al’c-r  the  (s-1  )th  itera- 


■  am. 


t. 


Q(i)  <  s  otherwise. 


Step  0  :  Initialization.  D0{ i)  «—  i.  Q(i)  <—  0.  s  «—  1.  s’ «—  1 

In  the  following  steps,  i  ^  n  indicates  that  the  processors  are  work¬ 
ing  on  the  nodes,  and  i  >  n  indicates  that  the  processors  are  accessing 
the  pairs  of  edges  (ij.z,). 

While  s'=s  do 
Step  1  :  If  i  ^  n 


then  DSU)  D^iD^U)) 
if  Ds(i  Ds_}(i ) 


then  Q(Z>  (i  ))  *-  s 


Step  2  :  If  i  >  n 


then  if  DSUX)  =  Z>s_1(z 3 ) 


then  if  Ds{i 2)<DS(: , ) 

then  Ds{Ds(ix))  —  DSU2) 

Q{Ds(i2))  ♦-  s 

Comment  :  If  Ds  ( /j )  has  not  been  changed  in  Step  1.  that  is.  it  has 
pointed  to  a  root,  then  the  processor  checks  if  :2  is  pointing  to  a  smaller 
vertex.  If  that  is  the  case,  then  it  hooks  the  root  which  is 
DjUj)  onto  Ds(i\).  Simultaneously,  all  the  processors  for  which 
Ds(j  )  =  Ds(i ,)  and  Ds(k  )  <  DSU ,)  try  to  update  £>s  (  Ds  (;,)). 


S*ep  3  :  If  i  >  n 


then  if  Ds( i } )  =  DsiDs{il  J)  and  <  s 


then  if  ( /, )  =*  Dsii«) 


r 


i 
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then  Ds ( Z)5 ( / j ) !  .—  i)^,) 

Comment  :  The  processor  checks  if  Z^D,)  is  a  root.  II  so,  it  checks  by 
using  Q  if  it  is  a  stagnant  root  and  if  it  is  so  it  tries  to  hook  it  onto 
another  tree.  This  is  tried  simultaneously  by  the  processors  such  that 
Ds(j)  =  DSi r>  =*  Ds{k). 

Step  4  :  If  i  ^  n 

then  Ds(i  )<- £>S(Z)S  (j  )) 

Step  5  :  If  i  <  n  and  QCi)  =  s 
then  s’  ♦—  s  + 1 
s  <—  s-1 

end  while 

Comment  :  As  soon  all  the  trees  are  stagnant,  Q(i)  <  s  for  all  i, 
1  ^  i  ^  n  ,  and  thus  s’  will  not  be  incremented  while  s  is  incremented, 
and  the  algorithm  terminates. 

In  steps  1 .4  and  5  the  concurrency  is  across  the  nodes.  In  steps  3  and  4  the  con¬ 
currency  is  applied  to  the  edges. 

During  iterations  the  nodes  in  the  graph  remain  the  same  while  the  edges 
change.  The  result  is  a  graph  that  is  repart itioned  into  blocks  where  each  block 
is  solved  using  one  processor.  A  block  consists  of  a  root  node  and  its 
corresponding  leaves.  The  number  of  leaves  varies  from  none  (  only  1  node  in 
the  block  )  to  n  where  n  is  the  number  of  nodes  in  the  circuit.  In  the  program 
the  loop  for  solving  the  blocks  is 
cvd  concur 


^1 
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do  3  i=l. number  of  blocks 


3  continue 


The  output  of  the  solve  routine  is  the  node  voltages. 

The  next  step  is  to  concurrently  do  table-lookup  for  the  transistors  with 
the  new  voltages.  If  the  resulting  new  gate-drain  and  gate-source  regions  are 
the  same  as  the  old  ones,  and  if  the  new  node  voltages  are  within  a  tolerance  of 
the  old  node  voltages,  then  the  solution  is  found.  Time  is  then  incremented  by 
an  automatically  determined  time  step.  Otherwise,  the  iteration  (  local  connec¬ 
tivity,  global  connectivity,  and  solve  )  is  repeated  until  convergence  is 
obtained.  Note  that  repartitioning  is  only  done  during  the  dc  solution  phase. 

During  transient  analysis  the  circuit  could  be  repartitioned  a  large  number 
of  times  since  repartitioning  is  potentially  carried  out  at  each  iteration.  It  is 
then  desirable  to  reduce  the  cost  of  repartitioning  as  much  as  possible.  The 
algorithm  described  in  the  previous  paragraphs  repartitions  the  entire  circuit. 
Since  only  a  small  part  of  the  circuit  experiences  region  changes,  and  therefore 
edge  changes,  the  repartitioning  needs  to  be  performed  only  on  this  changing 
part.  A  modification  of  the  original  Shiloac  and  Vishkin  algorithm  which  only 
repartitions  part  of  the  circuit  is  described  next. 

Modified  Shiloac  and  Vishkin  algorithm  : 

Once  the  Shiloac  and  Vishkin  algorithm  is  applied  to  the  entire  circuit,  the 
repartitioning  is  performed  on  selective  parts  of  the  circuit  as  follows  : 

Step  1.  The  gate-drain  and  gate-source  edges  of  a  transistor  of  the 

n— 1  iteration  are  compared  to  the  ones  of  the  n"'  iteration.  If 
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different,  then  check  if  the  root  of  the  source  has  been  Sagged.  The 
Sag  indicates  if  the  root  has  been  added  to  ’he  list  of  nodes  that  need 
repartitioning.  If  the  root  node  has  not  been  flagged,  flag  it  and  add 
the  root  node  to  the  list.  The  same  checking  is  performed  on  the  root 
of  the  drain  (  and  the  gate  if  necessary  ). 

Step  2.  Each  root  node  in  the  list  has  pointers  to  the  list  of  transistors. 

These  transistors  have  their  source  and  drain  nodes  as  the  leaves  of 
the  root  node.  The  edges  from  these  transistors  represent  edges  (  that 
is  ordered  pair  <i,j>  where  i  and  j  are  the  nodes  connected  to  the 
edges  3  in  the  Shiloac  and  Vishkin  algorithm.  The  original  algorithm 
considers  all  the  edges  in  the  graph  to  be  partitioned.  The  nodes  of 
the  transistors  are  the  vertices  in  the  algorithm.  Again,  in  the  origi¬ 
nal  algorithm,  all  nodes  in  the  graph  are  considered. 

An  example  showing  the  method  is  given  in  Figure  19.  where  the  circles  are  the 
nodes  a  circle  enclosing  a  star  is  a  root  and  the  solid  lines  are  the  edges  of  the 
graph.  The  edges  are  created  during  local  connectivity  checks  of  the  transistors. 
The  broken  lines  with  dots  are  aiso  edges;  however,  these  are  either  new  edges 
created  or  old  edges  removed  on  the  n' '  iteration.  The  directed  broken  lines  are 
pointers  created  during  the  Shiloac  and  Vishkin  algorithm.  On  the  top  figure  ( 
n'  '  iteration  )  there  are  three  subgraphs  with  three  roots.  During  the  r, ‘  ‘  itera¬ 
tion  one  edge  is  deleted  and  one  edge  is  crea’ed.  These  edge  changes  affect  only 
'wo  of  the  subgraphs.  Therefore,  the  repartition  is  performed  only  on  those  col¬ 
lections  o!  edges  and  nodes.  The  third  subgraph  is  no*  affected  by  edge  changes 
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so  there  is  no  need  to  repartition  this  part.  The  resulting  repanitioned  graph  at 

t 

n  +  1  "  iteration  is  shown  at  the  bottom  of  the  Figure  Id. 

Besides  repartitioning  only  the  necessary  parts,  the  computation  time  can 
be  reduced  even  further  by  analyzing  only  the  active  subcircuits.  Selecting  the 
active  subcircuits  is  explained  in  the  preceding  chapter.  An  example  show-ing 
parts  that  need  repartitioning  (  and  analysis  )  and  those  that  do  not  need  repar- 
titioning  but  require  analysis  is  shown  in  Figure  20. 


The  symbols  of  solid  lines,  broken  lines,  broken  lines  with  dots,  circles  and 
circles  enclosing  stars  in  Figure  20  have  exactly  the  same  meaning  as  the  ones  in 
Figure  19.  At  the  top  of  Figure  20  is  the  graph  at  n‘h  iteration.  There  are  four 
subgraphs  with  four  roots.  Note  that  there  is  one  edge  being  formed  at  nth 
iteration.  The  two  subgraphs  affected  by  the  new  edge  are  repartitioned,  while 
the  other  two  subgraphs  do  not  have  any  edges  deleted  or  created;  hence,  no 
repartition  is  necessary  on  these  subgraphs.  Flowever,  one  of  these  two  sub¬ 
graphs  contains  active  transistors.  This  particular  subgraphs  is  solved  (  no 
repartitioning  )  and  the  other  subgraph  is  neither  repartitioned  nor  solved.  The 

l  h 

new  graph  at  r.  +1  iteration  is  shown  at  the  bottom  of  Figure  20. 

Flowchart  II.  showing  the  modifications,  is  shown  on  page  85.  The 
modification  is  done  to  the  box  enclosed  by  broken  lines  on  Flowchart  I,  which 

t b 

is  shown  in  page  73.  Filter  I  separates  the  roots  at  the  end  of  i  "  iteration  into 
two  groups,  one  containing  roots  affected  by  changes  of  edges.  This  is  the  group 
that  is  being  repartitioned  using  the  Vishkin  and  Shiloac  algrorithm  and  is  later 
solved.  The  rest  of  the  roots  are  partitioned  even  further  into  two  groups,  one 
containing  roots  with  some  active  transistors.  This  group  is  later  solved.  The 
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New  regions  *  Old  regions  ? 


Filter  I  :  separate  roots  into  2  groups. 
Group  A  is  for  roots  with  some  leaves 
change  regions.  These  go  to  path  A  below. 
Group  B  is  for  the  rest  of  roots.  These 
go  to  path  B  below. 


Vishkin  algorithm  to 
determine  connectivity. 


Filter  II :  separate  these  roots  into  2  groups. 
Group  C  is  for  roots  with  some  active 
transistors.  These  go  to  path  C  below 
Group  D  is  for  the  rest  of  the  roots  (latent). 
These  go  to  path  D  below. 


Latent  subcircuits 
not  to  be  included  in  Solve) 


rest  of  the  roots  are  latent  subcircuits  that  are  thrown  to  Filter  I  to  be  checked 
later  if  new  edges  formed  affect  these  roots. 

As  mentioned  before,  unlike  on  the  uniprocessor,  on  the  parallel  processors 
the  partitioning  is  applied  to  the  entire  circuit.  Most  of  the  time  the  number  of 
nodes  in  one  connected  component  is  less  than  three.  For  these  small-size  con¬ 
nected  components  a  direct  method  is  used  to  solve  for  the  unknown  variables. 
For  larger  connected  components  (  number  of  nodes  larger  than  three  )  the 
blocks  are  made  as  lower  triangular  as  possible  by  applying  Tarjan's  depth-first 
search  method  [24]  (  to  obtain  strongly  connected  components  within  the  large 
blocks  )  followed  by  the  analysis  sequencing  method  described  in  Chapter  2. 
These  nearly  lower  triangular  blocks  are  then  solved.  An  example  of  the 
resulting  matrix  is  shown  in  Figure  22.  As  mentioned  earlier  the  large  blocks 
do  not  occur  often  in  circuits  that  we  simulated. 

In  summary,  the  circuit  is  decomposed  into  blocks  where  each  block  is 
solved  using  the  direct  method  by  one  processor.  If  the  size  of  the  block  is  small 
no  reordering  is  done.  If  the  size  is  large  then  the  block  is  made  as  lower  tri¬ 
angular  as  possible  and  then  solved  using  one  processor. 

An  alternative  approach  is  to  solve  one  block  using  all  the  available  pro¬ 
cessors;  that  is.  the  unknown  variables  in  one  block  are  determined  in  parallel. 
Parallel  numerical  linear  algebra  such  as  described  in  [50]  is  needed.  The  draw¬ 


back  of  this  method  is  that  in  many  cases  the  sizes  o!  the  blocks  are  small.  This 


means  there  are  more  processors  assigned  to  a  block  than  needed  to  solve  for  the 
unknowns  in  that  block.  1  nerefore.  there  would  be  idle  processors  mosi  of  the 
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If  floating  capacitors  exist  in  the  circuit  then  in  the  matrix  the  floating 
capacitors  will  create  feedback  and  feedforward  terms  within  and  between  the 
diagonal  blocks  created  by  the  partitioning.  Unlike  in  the  uniprocessor  case 
where  the  Gauss-Seidel  relaxation  method  is  employed,  in  the  parallel  case  the 
Gauss-Jacobi  relaxation  method  is  applied. 
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CHAPTER  5 
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»  IMPLEMENTATION  AND  RESULTS 

•I 

All  three  algorithms  described  in  the  preceding  chapters  have  been  imple¬ 
mented  in  computer  programs  to  run  on  VAX  11/780  and  SUN  workstations. 
The  parallel  implementation  of  the  dynamic  partitioning  method  is  for  Alliant 
FX/8.  a  parallel-vector  computer  with  8  processors.  The  programs  are  written 
in  FORTRAN  and  each  has  over  7800  lines  of  code. 

The  input  file  containing  MOS  network  descriptions  is  similar  to  the  one 
for  MOTIS-C.  except  in  our  case  the  MOS  network  description  can  be  in  the 
transistor  level  or  the  predefined  subcircuit  level.  The  predefined  subcircuits  are 
nand.  nor,  inverter,  and-or-inverter,  and  pass  transistor  net. 

For  the  uniprocessor  implementation  the  following  steps  are  performed  in 
the  preprocessing  stage.  For  each  type  of  devices  a  puY  table  is  generated 
automatically.  If  no  device  information  is  given  then  default  values  for  typical 
long  channel  devices  are  used.  Next,  the  circuit  is  partitioned  into  dc-connected 
subcircuits  [15],  If  there  exist  floating  capacitors  then  each  one  is  checked  if  it  is 
larger  than  the  sum  of  the  grounded  capacitors  to  which  the  floating  capacitor  is 
connected.  If  it  is.  a  dc-path  is  assumed  to  exist  between  the  two  nixies  for  par¬ 
titioning  purposes.  If  the  capacitors  are  pv/.  worst  case  values  are  assumed. 
This  worst-case  partitioning  is  performed  only  once  and  is  based  on  the  worst- 
><’  case  graph  condition  of  tne  transistors  (Figure  13e).  Each  subcircuit 
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representing  a  predefined  or  a  dc-connected  component  is  replaced  by  a  node  in 
a  graph  representing  the  circuit.  The  strongly  connected  components  of  the 
graph  are  identified  using  Tarjan's  depth-first  search  described  in  Chapter  4. 
Then,  analysis  sequencing  is  performed  on  the  new  acyclic  graph  where  each  see 
has  been  replaced  by  a  new  node.  The  strongly  connected  components  are 
solved  using  the  dynamic  partitioning  method,  while  each  subcircuit  of  the  rest 
of  the  circuit  is  solved  using  the  direct  method.  If  the  user  knows  that  some 
simple  subcircuits,  such  as  nand,  nor,  inverter,  and-or-inverter,  are  not  a  part 
of  an  see.  then  in  the  input  file  the  user  can  specify  these  simple  subcircuits  as 
gates.  This  causes  the  program  to  solve  those  simple  gates  using  the  fast  pwl 
method  described  in  Chapter  2.  The  dynamic  partitioning  method  automati¬ 
cally  partitions  the  see  into  smaller,  completely  decoupled  blocks.  In  the  cases 
where  the  blocks  are  too  large  (  size  of  block  is  larger  than  three  )  those  blocks 
are  made  as  lower  triangular  as  possible  by  applying  Tarjan’s  depth-first  search 
approach  and  analysis  sequencing  method  described  in  Chapter  4.  In  almost  all 
cases  in  practice  the  dynamic  partitioning  breaks  the  feedback  paths  in  the  see. 
Information  of  regions  of  transistors  needed  for  dynamic  partitioning  is 
obtained  during  the  equation  formulation  process  when  the  conductances  and 
current  sources  are  fetched  from  the  />•»»•/  device  tables.  Based  on  this 
knowledge  of  regions,  the  program  determines  the  local  connectivity  of  the 
transistors.  This  local  connectivity  in  turn  is  used  to  determine  the  global  con¬ 
nectivity  of  the  transistors  in  the  see  by  applying  depth-first  search  [14].  This 


depth-first  search  is  not  costly,  since  the  size  of  an  see  is  usually  not  large.  The 
subblocks  are  now  solved  in  a  sequence  which  usually  does  not  include  any 
feedback:  and  thus  convergence  is  obtained  in  one  sweep. 

Waveforms  of  some  examples  are  shown.  The  first  one  is  a  5-stage  ring 
oscillator  circuit  (Figure  15)  containing  floating  capacitors.  The  example  is  used 
to  show  that  fairly  accurate  results  are  obtained  by  the  />v/  method. 
Waveform  SPICE  is  obtained  by  SPICE  using  level  1  model  with  external  capa¬ 
citances  between  any  two  adjacent  nodes  included.  Waveform  PWLFULL  is 
obtained  by  using  the  p-wl  approximation  and  solving  the  entire  circuit  without 
partitioning  or  relaxation.  Wa'vform  PWLRELAX  is  obtained  by  dynamic  par¬ 
titioning  and  relaxation  iteration  to  take  into  account  the  effects  of  floating 
capacitances  between  subcircuits.  From  the  figures  one  can  conclude  that  the 
/nW  method  gives  accurate  waveforms. 

The  second  example  is  a  tally  circuit  (Figures  24-25).  Worst-case  parti¬ 
tioning  would  define  the  entire  circuit  as  one  block  while  dynamic  partitioning 
decomposes  the  circuit  into  small  subblocks  that  can  be  solved  separately;  as  a 
result,  computation  time  is  reduced. 

The  third  example  is  the  10-stage  inverter  circuit  (Figure  26).  The  output 
of  the  first,  fourtn.  seventh  and  tenth  inverters,  together  with  SPICE 
waveforms,  are  shown  in  Figure  27.  In  this  example  the  circuit  is  specified  as 
inverter  gates  and  the  fast  p^i  method  is  applied.  Note  that  for  simple  gates 
such  as  an  inverter  the  fast  p-wl  method  is  fairly  accurate. 

The  fourth  example  is  a  full-adder  circuit  containing  pass  transistors  (Fig¬ 
ure  28).  The  waveforms  of  the  sum  and  carry  nodes  are  shown  in  Figure  29. 


The  last  example  is  the  pla  circuit  (Figure  30).  The  waveforms  of  the  out¬ 
put  of  the  last  inverters  of  the  pla  and  SPICE  waveforms  are  shown  in  Figure 
31.  This  pla  contains  a  strongly  connected  component  which  is  solved  using  the 
dynamic  partitioning  method.  The  rest  of  the  circuit,  which  consists  of  simple 
gates,  is  solved  using  the  fast  pwl  method. 

It  can  be  seen  from  the  figures  that  the  p-wl  approximation  is  quite  accurate 
compared  to  SPICE. 

The  computation  time  as  compared  to  SPICE  is  shown  in  the  following 

table. 


Table  III  :  Comparison  of  the  pv/  method  and  SPICE 


Analysis  time 

circuit 

devices 

dynamic 

no  dynamic 

SPICE 

partitioning 

partitioning 

5-stage  ring 
oscillator 

1  no  floating 
capacitor) 

1 1 

1.10s 

1.417s 

50.13s 

5-stage  ring 
oscillator 
( with  floating 
capacitor) 

1 1 

1.17s 

3.000s 

45.20s 

tally  circuit 

18 

2.550s 

3.167s 

132s 

pla 

140 

b.383s 

14.22s 

977s 

emos  alu 

142 

4.171s 

22.77s 

n.c. 

me.  :  no  convergence 


The  table  shows  simulation  results  performed  on  some  circuits.  One  observes 
that  computation  time  is  reduced  when  the  dynamic  partitioning  is  applied  to 


The  circuits.  For  small  circuits  (  less  than  50  transistors  )  the  speedup  is  about 
40  as  compared  to  SPICE.  For  a  larger  circuit,  such  as  the  pla.  the  speedup  is 
ove-  100. 

Computation  time  comparison  with  respect  to  RELAX2  [26],  which  applies 
worst-case  partitioning  method  .  is  shown  in  Table  IV.  The  table  shows  that 
pv/  method  is  more  than  10  times  faster  than  RELAX2,  even  for  these  rela¬ 
tively  small  circuits. 

Table  IV  :  Comparison  of  the  />v/  method  and  RELAX3.2 


Analysis  time 

circuit 

pwl  with 

RELAX3.2 

{number  of 

dynamic 

devices} 

partitioning 

5-stage  ring 
oscillator 
(no  floating 
capacitor)  {11 

1.10s 

} 

14.02s 

pla  { 149} 

b.383s 

155.36s 

To  obtain  the  rate  of  growth  of  computation  time  vs.  the  number  of  dev¬ 
ices.  an  n-stage  ring  oscillator  circuit  is  simulated.  The  CPl’-times  for  the 
analysis  times  for  various  n  are  shown  in  Table  V  and  plotted  in  Figure  32.  Ve 
observe  that  the  time  grows  fairly  linearly  as  n  increases. 


Table  V.  Computational  complexity 


CPU-time  vs.  n 


Number  of  devices  (n)  Analysis  time  in  s 


9 


39  9. 


5 


9  18.56 


96  23.183 


9  27.717 


46.383 


From  the  table  above  one  can  conclude  that  the  CPU  time  taken  by  jwl 
method  (  method  3  )  grows  linearly  with  the  circuit  size. 

The  dynamic  partitioning  method  (  method  3  )  implementation  on  the 
Alliar.t  FX/8  is  similar  to  the  one  for  the  sequential  computer.  The  difference  is 
that  in  parallel  implementation  there  is  no  need  to  partition  the  circuit  into  dc- 
connected  components  at  the  outset.  Instead,  the  partitioning  is  applied  to  the 
entire  circuit.  Also,  there  is  no  need  to  sequence  the  partitioned  subcircuits  since 
they  are  either  completely  decoupled  from  one  another,  or  a  Gauss-Jacobi 
method  is  used  when  a  small  capacitive  coupling  exits  between  subcircuits. 
However,  the  parallel  implementation  contains  a  locking  mechanism  that  the 
sequential  one  does  not  have. 

The  following  table  shows  the  results  of  two  circuits  run  on  the  Alliant 
FX/8.  The  circuits  chosen  consist  of  more  than  100  transistors  and  are  expected 
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to  show  some  simulation  speed  advantage  on  the  parallel  computer  due  to  the 
presence  of  large,  strongly-connected  components  in  ’he  circuits.  Speedup 
obtained  for  the  pla  circuit  is  over  600  times  as  compared  to  SPICE,  while  the 
speedup  of  the  barrel-shifter  circuit  is  over  400  times  as  compared  to  SPICE. 
The  SPICE  results  are  from  a  uniprocessor  implementation.  Compared  to  1  pro¬ 
cessor,  using  8  processors  is  over  3  times  faster  {  for  pla  circuit  ),  about  5.7 
times  faster  (  for  the  barrel  shifter  )  and  about  5.3  times  faster  (  for  the  digital 
filter  );  this  means,  for  the  pla.  the  efficiency  of  processor  utilization  is  over  37 
percent,  for  the  barrel  shifter  the  efficiency  is  about  71  percent  and  for  the  digi¬ 
tal  filter  it  is  66  percent. 

Table  VI  :  Analysis  time  on  the  parallel  processors 


Analysis  time  on  Alliant  FX/8 

dynamic 

dynamic 

circuit 

devices 

partitioning 

partitioning 

SPICE 

(8  processors) 

(1  processor.) 

pla 

149 

1.14s 

5.433s 

977s 

barrel  shifter 

256 

1.983s 

11.3s 

862s 

digital  filter 

698 

1 1.6s 

61.31s 

- 

Another  version  of  the  program  contains  filtering  routines  to  do  selective 
repartitioning  and  latency  checks.  The  aim  is  to  do  repartitioning  and  solve 
only  on  some  part  of  the  circuit.  For  the  pla  circuit  the  coinpu’ation  time  is 
reduced  from  1.417  seconds  to  1.1  seconds.  For  'he  barrel  shifter  no  speedup  is 
obtained:  this  is  due  to  a  large  number  ol  repar'itioning  and  solving.  The  time 
spent  on  selective  repartitioning  and  latency  checks  for  the  pla  is  0.133  seconds 
(  or  about  12  percent  of  the  total  CPE  time  ).  while  f  v  the  barrel  shifter  it  is 
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CHAPTER  6 


CONCLUSIONS  AND  FUTURE  W  ORK 

The  piecewise  linear  approach,  as  described  in  this  thesis,  is  an  attractive 
method  for  solving  circuit  problems.  This  is  due  to  the  fact  that  simplified 
(  pMl  )  transistor  models  are  used  yielding  a  lower  memory  requirement  and 
faster  computation  time,  and  yet  the  method  produces  results  that  are  close  to 
those  from  other  circuit  simulators.  Moreover,  convergence  is  guaranteed  in  the 
pw/  Katzenelson  method  and  its  variants. 

A  pv.7  MOS  transistor  model  approximation  is  described  in  Chapter  2  and 
Appendix  A.  The  model  contains  two  parts:  namely,  the  gate-drain  and  the 
gate-source  parts.  Each  part  consists  of  a  current  source,  a  resistor  and  a  depen¬ 
dent  current  source.  The  dependent  current  source*  is  inserted  to  satisfy  the 
requirement  that  the  sum  of  currents  at  the  gate  node  is  equal  to  zero.  Higher 
order  effects  are  modeled  by  the  use  of  tabulated  nested  functions,  as  described 
in  Appendix  A.  The  pv/  approximation  method  is  also  easily  applicable  to  the 
Ebers-Molls  bipolar  transistor  model. 

Two  pv/  methods  are  also  described  m  Chapter  2.  The  nrst  :s  a 
modification  of  the pv/  method  on  slinplices.  The  method  is  suitable  for  circuits 
that  demand  more  accurate  analysis.  The  idea  is  similar  to  the  pW  method 
developed  by  Katzenelson,  except  in  this  method,  rather  ' han  Ending  boundary 
crossings,  a  vertex  to  be  removed  is  selected.  This  vertex  removal  process  is 


much  simpler  than  determining  which  boundary  is  crossed.  The  original 
method  is  general  and  slow.  For  timing  analysis  some  speedup  is  obtained  b\ 
piecewise  linearizing  the  transistor  model.  Even  after  piecewise  linearization  ol 
the  model  and  parallelization  of  the  calculation  of  the  matrix  entries,  the 
method  is  not  fast  enough  compared  to  the  standard  circuit  simulator  SPICE. 
The  second  method  is  a  fast  /no'  method  where  the  solution  is  obtained  by  par¬ 
titioning  the  circuit  into  one-way  subcircuits.  where  each  subcircuit  has  one 
output  with  one  capacitor  lumped  t  the  output.  The  waveform  solution  at  the 
output  of  each  subcircuit  is  found  by  using  />v/  Thevenin’s  equivalent  circuit. 
Waveform  relaxation  is  applied  when  feedback  exists  among  the  subcircuits. 
The  method  is  fast  and  fairly  accurate  for  simple  circuits  such  as  nand.  nor  and 
inverters.  Larger  circuits  such  as  pass  transistor  networks  require  direct 
methods,  since  the  fast  pvl  method  is  not  accurate  enough  and  tends  to  become 
slow. 

Described  in  the  third  chapter  is  a  new  idea  of  dynamically  partitioning 
the  pwl  circuit.  The  method  is  fast  because  the  partitioning  is  based  on  compar¬ 
ing  integers  representing  the  gate-source  and  gate-dram  regions  of  a  transistor. 
Simulation  results  on  a  typical  example  show  that  more  than  two  orders  of 
magnitude  speedup  is  obtained.  The  dynamic  partitioning  method  is  suitable  for 
solving  strongly  connected  components,  or  dc-connec'ed  subcircuits  where  the 
number  of  transistors  is  large.  Smaller  subcircuits  can  also  be  solved  this  way. 
or  with  the  fast  pwl  method  described  in  Chapter  2.  or  with  the  direct  method. 
Another  advantage  of  the  dynamic  partitioning  method  is  its  suitability  for 
parallel  implementation.  This  is  described  m  Chapter  4.  The  local  connectivity. 
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global  connectivity  and  solving  the  dynamicailv  partitioned  subcirci.its  can  be 
performed  in  the  concurrent  mode. 

The  detailed  parallel  implementation  of  the  dynamic  partitioning  method 
is  described  in  Chapter  4.  The  simulation  of  a  number  of  circuit  examples 
shows  good  speedup  and  efficiency  of  utilizing  the  parallel  processors.  Because 
the  dynamic  partitioning  method  partitions  the  circuit  into  completely  decou¬ 
pled  small  subcircuits,  the  gam  in  computation  speed  is  fairly  linear  as  the 
number  of  available  processors  increases. 

The  implementation  issues  of  the  program  PLATINUM  which  is  based  on 
the  dynamic  partitioning  me’hod  is  described  in  Chapter  5.  Several  waveform 
examples  and  comparison  with  respect  to  other  simulators  are  given  to  show 
the  validity  of  the  method. 

PLATINUM  as  an  experimental  tuning  analysis  shows  good  results  for 
MOS  circuits.  More  enhancements  to  the  program  are  needed:  in  particular,  it 
could  be  extended  to  handle  bipolar  circuits.  It  should  not  be  a  difficult  task, 
since  the  bipolar  transistor  model  is  already  in  the  Ebers-Molls  configuration 
which  is  exactly  what  is  needed  for  applying  the  dynamic  partitioning  method. 
Future  work  would  also  involve  more  testing  on  larger  circuits  and  on  other 
■ypes  of  technology  such  as  gallium  arsenide  circuits.  Also,  since  the  method  is 
highly  paraiielizablc,  it  would  be  interesting  to  implement  the  method  on  a 
massively  parallel  machine,  similar  *o  the  rt-cen*  parallel  implementation  of  the 
relaxation  method  [48]. 

An  interesting  future  work  is  to  ir.corpora'e  'he  dynamic  Dart;! toning 
me* hod  described  ir.  this  thesis  into  a  s.m.ator  such  as  RL1.AX2  which 
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employs  accurate  transistor  models.  Although  PLATINUM  uses  simple  /nW 
transistor  models,  iterations  are  necessary  when  floating  capacitors,  such  as 
from  the  gate-drain  and  the  gate-source  capacitances,  exist  in  the  circuit.  Since 
iterations  are  performed  even  for  the  simple  models,  it  would  be  a  good  idea  to 
use  more  accurate  transistor  models. 

The  questions  that  need  to  be  answered  in  this  case  are 

1.  How  does  the  selection  of  breakpoints  affect  the  number  of  iterations  to 

reach  convergence. 

2.  If  the  selection  of  breakpoints  affects  the  iterations,  would  it  help, 

for  accuracy,  to  have  multiple  pvl  models  for  each  transistor.  The  multi¬ 
ple  p\d  model  is  based  on  a  nested  tabulated  functional  represented  and 
is  described  in  Appendix  A.  A  tradeoff  between  speed  and  accuracy  is  an 
issue  here. 

3.  Compared  to  the  heuristic  partitioning  method  currently  used  in  RELAX2. 

how  much  speedup  does  the  dynamic  partition  provide. 

T  is  possible  that  when  more  accuracy  is  desired  the  dynamic  partitioning 
method  based  on  a  simple  pwl  model  can  be  used  to  partition  the  circuit,  while 
more  accurate  functional  models  are  used  in  lormulating  and  solving  the  equa¬ 
tions. 


cl 


V 

V 

V 

7, 

v! 

.V 


•v 


APPENDIX  A 


SHORT  CHANNEL  PWL  TRANSISTOR  MODEL 

This  Appendix  contains  the  more  complete  Meyer's  model  that  includes 
the  effect  of  the  body-source  and  body-drain  voltages.  The  simplified  Meyers 
model  is  given  in  Chapter  2.  Instead  of  using 

/  gx  )  =  ^ gx  ~ ^  r  ^ 

one  needs  to  use  [25] : 

/  =  +  ^k(vXB+2**f  )3/2  (A.l) 

k  =  texl*cx{2li  %Cy),/2 

where  \’XB  is  the  x  to  body  voltage  (  x  is  either  source  or  drain  ).  <t> f  is  the 
Fermi  potential  of  the  substrate,  «s.  is  the  permittivity  of  the  substrate.  N  is 
the  substrate  concentration,  and  k  is  a  constant.  Note  that  the  inclusion  of  the 
body  effect  preserves  the  one-dimensionality  of  the  tables.  Only  one  additional 
table  for  the  body  effect  is  needed  to  represent  the  second  part  of  (A.l)  for  each 
device.  The  short  channel  effects  on  the  threshold  voltage  VT  and  the  mobility 
jj.-.-p  can  also  be  easily  included  in  the  tabular  representation.  The  threshold 
voltage  Yj  is  given  by  [55]  : 

lV='Vs‘r2<S>'  -  oVM  +yrs^'  ~VBS)V~ 

+  r.  <  :<> .  -y3S)  ■  a. 2) 


where 
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Fs  =  correction  factor  for  short  channel  effect s 
/  ^  =  correction  factor  for  narrow  channel  effects 
y  =  hulk  threshold  parameter 
o  —  coefficient  of  static  feedback 


=  ETA  - 


cohere 


CL' 

OX 


ETA  =  constant  static  feedback  effect  parameter 
ft  =  empirical  constant 
C0l  =  oxide  capacitance 
L  =  channel  length 

The  mobility  fiFFF  is  defined  in  two  regions  as  follows: 
For  the  saturation  region  : 

VO 

V'EFF  - 

1  A  THETA(Vcs-Vt) 

while  for  linear  region  : 


(A. 3) 


■UEFF  ~ 


(A. 4) 


V  MAX'LL 


UO  is  the  surface  mobility.  THETA  is  the  empirical  mobility  modulation 
parameter  and  VMAX  is  the  maximum  drift  velocity  of  the  carriers.  After 
taking  into  account  the  short  channel  effects  the  equation  becomes  as  follows  : 


I Ds  —  A  ( V  Gs X  Ds  )[i  V  GS  l  r!  \  BS.\  DS  .)} 


'  ^  GD  '  as-^  DS ^  ^  —  i:  '  *  IB  ^  ^ 

3 

A(\G5.r/55  )  =  A:£;Tcru-  2 t:xL 


(A.5) 
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A  table  based  on  Equation  (A.5)  would  be  multidimensional.  To  alleviate  this 
difficulty  we  choose  a  set  of  discrete  values  of  '•’S5,r£)S  anc*  '  cs-  ^ac!l  combina¬ 
tion  of  ^  BS-\  ds  and  \  GS  is  used  to  obtain  the  threshold  voltage  \  'T  and  the 
mobility  /j-eff-  Then  for  the  pairs  \’T  and  /i EFF  we  generate  a  set  of  values  of 
conductances  and  current  sources  as  the  piecewise  linearized  model.  For  the  n- 
type  device  the  values  of  VDS  used  in  the  table  are  0,1. 2. 3, 4, 5.  Similarly,  the 
values  for  VBS  are  -5.-4.-3, -2. -1.0  and  for  f'C5  0,1, 2, 3.4.5  (  \'GS  <0  indicates 
the  device  is  off  ).  If  more  accurate  results  are  desirable  then  more  combina¬ 
tions  of  yBsyDS  and  \'GS  are  used  to  construct  the  table.  The  advantage  of  this 
method  is  that  simple  table  lookup  methods  can  be  used  to  incorporate  some  of 
the  short  channel  effects. 

The  above  method  can  be  considered  as  a  nested  modeling  of  the  device. 
First,  one  determines  the  values  of  the  variables  at  the  lowest  or  deepest  level 
of  the  equation.  In  our  case  the  variables  are  l'rand/u..  Then  using  these 
values,  values  of  the  variables  of  the  higher  level  such  as  currents  and  conduc¬ 
tances  of  the  device  model  are  calculated.  Since  in  our  case  the  independent 
variables  \  BSXDS  and  \'GS  for  the  tables  are  determined  a  priori  and  p\d 
transistor  approximations  are  tabulated  in  the  preprocessing  step,  no  calculation 
of  a  transistor  characteristic  is  performed  during  the  transient  analysis,  and 
hence  the  computation  time  is  reduced.  The  calculation  for  the  device  elements 
is  done  during  the  preprocessing  step  and  the  values  are  stored  in  a  tabie.  This 
method  of  nested  device  modeling  is  similar  to  the  one  in  [5b].  The  difference  is 
as  follows.  In  [5b]  the  currents  and  conductances  at  various  combinations  of 
voltages  are  tabulated.  Extrapolation  and  in'erpolation  are  necessary  for  any 


combination  of  voltages  outside  the  tabulated  ones.  In  the  approach  described 
here  the  tabulated  currents  and  conductances  are  the  results  of  piecewise 
linearizing  the  original  function.  As  a  result,  the  currents  and  conductances  for 
all  combinations  of  voltages  are  defined,  and  therefore  neither  extrapolation  nor 
interpolation  is  necessary. 

To  study  the  accuracy  of  the  p\d  approximation  method  described  above,  a 
CMOS  latch  with  short  channel  transistors  (  1  micron  length  )  is  analyzed 
(  Figure  33  ).  SPICE  outputs  using  simple  model  (  level  1  )  and  semiempirical 
model  (  level  3  )  are  shown  in  Figure  34.  It  is  clear  that  there  is  a  noticeable 
difference  between  the  SPICE  outputs  when  using  level  1  and  level  3.  Figure  34 
aiso  shows  the  outputs  using  the  pW  model.  It  can  be  seen  that  there  is  good 
agreement  between  the  output  of  SPICE  level  3  and  the  output  using  the  pW 
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APPENDIX  B 


DESCRIPTION  OF  THE  PROGRAM  PLATINUM 


This  appendix  contains  information  on  how  to  use  PLATINUM:  Piecewise 
LintAr  TIiniN'g  simulation  for  Mos  circuits.  The  input  to  the  program, 
referred  to  as  the  circuit  input  hie,  is  similar  to  the  input  hie  for  the  program 
PREMOS.  PREMOS  is  a  simulator  developed  by  Wei  [43].  PLATINUM  is  more 
general  than  PREMOS.  Some  of  the  features  of  PLATINUM  are 

1.  It  handles  circuits  described  at  the  transistor  or  subcircuit  level. 

2.  It  has  a  built-in  table  for  a  typical  pvl  MOS  driver,  pull-up  and  pass 

transistor.  The  input  file  may  contain  user-specified  transistor  parameters 
which  are  used  by  the  program  to  generate  new  pw.i  tables. 

3.  Capacitors  are  specified  either  from  a  node  to  ground  or  f rom  a  node  to 

another  node. 

The  types  of  subcircuits  that  can  be  handled  by  PLATINUM  are  nand.  nor. 
and-or-inverter  and  pass  transistor  network.  The  model  is  described  as 
MODEL  rnodnam  type  (parameters) 

where  modnarn  is  a  user-specified  name,  type  ;s  any  one  of  the  following  : 
nand.  nor,  and-or-inverter.  pass  transistor,  voltage  source,  and  a  set  of 
appropriate  parameters.  The  appropriate  parameters  for  each  type  are  (  please 
refer  to  Figures  B1-B4  )  : 

TYPE  PARAMETERS 
nand  wia  w!l  ca  ci  cl 
nor  wiowl.  co  cl 
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anclo:  w  la  w  lo  wll  ca  co  ci  cl  na  no 
irans  wio  veil  wit  co  cl  eg  ct  no  ni 
so urc  vl  vO  tO  tr  1 1  tf 

For  example,  MODEL  nd2  NAN'D  (  1  0.2  lOf  lOf  1001 ) 

The  models  are  used  in  the  circuit  description.  The  circuit  description  is  of  the 
1  orm 

name  nodel  node2  ...  modnam 

"name"  is  the  name  of  the  circuit  element.  The  nodes  "nodel  node2  ... "  con-  i 

tains  the  node  connections,  "modnam"  is  one  of  the  model  names.  The  node  con-  i 

nections  must  follow-  the  order  given  below  : 


TYPE 

ORDER  OF  NODE  NUMBERS 

nand 

a  1  a2  i  1 

nor 

ol  o2  1 

andoi 

al  a2  ...  ol  o2  ...  1  1 1  i2  ...  i( na- 1 ) 

caper 

r  1  n2 

so '  i  rc 

n- 

Besides  in  tht  subcircuit  level,  a  circuit  can  be  described  in  the  transistor 
level.  The  format  for  the  transistor  level  description  is 
name  drain  gate  source  body  trantvpe 

where  "name"  is  the  name  of  the  transistor  element,  "drain,  gate,  source  bodv" 
are  the  MOS  nodes,  and  ":rar.*ype"  is  the  transistor  *ype  1  such  as  PASS. 
DR IV  PR.  LOAD  ).  Drain  ar.d  source  nodes  are  interchangeable. 

Besides  circuit  description  'he  input  flit  also  contains  op* ions  commands. 
I  nc  available  options  are 


m 


rime  Ts’op  tsiep 


Time  is  the  command,  tstop  is  t he  length  of  analysis  time,  and  tsiep  is  'he 


time  step. 

preset  ( n  1  .v  1 )  (  n2,v2  )  ... 

preset  is  the  command  to  preset  at  the  beginning  of  simulation  a  node  To  a 
specihc  voltage.  nl,n2....  are  the  node  numbers,  and  v].v2....  are  the  node 
voltages, 
send  nl  r.2  ... 

send  is  the  command  To  print  out  The  node  voltages,  nl  n2  ...  are  the  node 
numbers. 

*able  (  w.l. kappa, vt.v  1  ,v2,v j.n ) 

table  is  the  command  to  generate  new  table  with  user-specified  parame¬ 
ters.  v.  is  the  width,  1  is  the  length,  kappa  is  the  transconductance 
parameter,  vt  is  the  threshold  voltage.  vl.v2\3  are  t.ne  selectee  voltage- 
breakpoints.  and  n  is  the  type  of  transistor  <r.=  l  is  for  driver.  n=2  is  !or 
load  and  n=3  is  for  pass  transistor). 

end 

end  is  a  command  indicating  the  end  of  the  input  file. 

An  example  of  a  complete  input  hie  is  given  next.  It  is  a  pia  circuit  that  is 


A 


referred  to  in  the  thesis. 


PI  .A  finite-state  machine  implementing  the  light  controller 

*sobeircu;1  model  card 

model  inv  nor2  (5  1  lOf  1 00: 

mode!  nor3  andoi*  5  5  1  lOt  10!  1  Of  100f  0  3) 

model  ror-l  ar.doi(5  5  1  1  Of  !()f  1  f if  100:  U  4  ) 

model  not  r  1  t  runs!  5  1  2  10!  1001  I  Of  50f  1  1  ’ 
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model  notr2  trans(5  1  2  lOf  lOOf  1  Of  50f  2  1  ) 

model  notr4  trans(5  1  2  lOf  lOOf  1  Of  501  4  1 ) 

model  notr5  trans(5  1  2  lOf  lOOf  lOf  501  5  1 ) 

model  pass  passt 

model  cap  capcr(50f ) 

model  elk  1  source  (4  1  lOn  5n  lOn  5n) 

model  clk2  source  (5  0  5n  5n  5n  5n) 

*  AND  plane 

xl  1 1  17  19  1  nor3 
x2  13  17  19  2  nor3 
x3  12  14  17  19  3  nor4 
x4  15  18  10  4  nor3 
x5  lb  18  19  5  nor3 
xb  12  13  18  20  6  nor4 
x7  11  18  20  7  nor3 
x8  14  18  20  8  nor3 
x9  15  1 7  20  9  nor3 
xlO  16  1 7  20  10  nor3 

*  OR  plane 

x  1 1  5  6  7  8  9  2 1  notr5 
x44  21  56  28  pass 
xl2  3  4  5  6  22  notr4 
x45  22  56  29  pass 
x  1 3  3  5  7  8  10  23  notr5 
x4b  23  56  30  pass 
xl4  6  7  8  °  10  24  notr5 
x47  24  56  3 1  pass 
x  15  4  5  25  notr2 
x48  25  56  32  pass 
xlo  1  2  3  4  5  2o  notr5 
x40  26  56  33  pass 
xl  7  9  10  27  notr2 
x50  27  56  34  pass 

*  output  registers 
xl8  28  35  notrl 
x5 1  35  55  4d  pass 
xl9  29  36  notrl 
x52  36  55  48  nass 
\20  30  30  37  inv 
x2 1  31  31  38  inv 
x22  32  32  30  mv 
\23  33  33  40  inv 
\24  34  34  41  inv 

*  capacitors  of  pass  trans 
\5b  28  0  cap 

x57  2Q  0  cap 
\58  30  0  cap 
\5°  3 1  0  cap 


XT 


Lv*.  *'  v-  «r.  »■,  a  r 


xhO  32  0  cap 
xb  1  33  0  cap 
xb2  34  0  cap 
\bb  48  0  cap 
xb7  49  0  cap 

*  input  buffers 
x25  57  42  notrl 
x53  42  55  45  pass 
x2b  58  43  notrl 
x54  43  55  4b  pass 
\27  59  44  notrl 
x55  44  55  47  pass 
xb3  45  0  cap 

xb4  46  0  cap 
\65  47  0  cap 

*  input  registers 
x28  45  45  50  inv 
x2b  46  46  5 1  inv 
x30  47  47  52  inv 
x3 1  48  48  53  inv 
x32  49  49  54  inv 
x33  50  50  11  inv 
\34  45  45  12  inv 
x35  51  51  13  inv 
x36  46  46  14  inv 
\37  52  52  15  inv 
x38  47  47  16  inv 
x40  53  53  17  inv 
x4 1  48  48  18  inv 
x42  54  54  19  inv 
x43  49  49  20  inv 
♦input  sources 

val  55  0  cl  k  1  01000100  0  10001 

va2  56  0  elk  1  0  0  0  1  0  0  0  1  0  0  0  1  0  0 

vaO  57  0  clk2  1111000000111111 

vhO  58  0  clk2  1  1  1  10000  0  0  000000 

vcO  59  0  clk2  1  1  1  1  1  1  1  1  1  I  1  1  1  1  1  1 

♦analysis  requests 

preset  (  35.01  (36.0) 

time  120n  In 

send  37  38  39  40  41 

v-  5 

end 
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